


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 








Volume XXXVII September, 1946 Number 6 








THE USE OF MEASURES OF ABILITY AND GEN- 
ERAL ADJUSTMENT IN THE PRESERVICE 
SELECTION OF NURSERY SCHOOL-KIN- 
DERGARTEN-PRIMARY TEACHERS* 


ELIZABETH MECHEM FULLER 
Institute of Child Welfare 


University of Minnesota 


Industries and professions are depending more and more upon 
scientific methods for selection of personnel. The increased spe- 
cialization of training and lengthening of the training period 
representative of this era makes early and wise choice of trade 
or profession imperative. Fortunately, the field of applied 
psychology has to some extent kept pace with the greater need 
for improved methods of personnel selection and has provided 
many valid and reliable tests and techniques to meet this need. 

The teaching profession, however, has not yielded readily usable 
information on teacher selection or appraisal through the use of 
standardized tests and measurements. There is rather general 
agreement that selection of teachers, particularly at the preservice 
level, is complicated by the fact that there is no one clear pattern 
for a successful teacher. Two recent graduates of the same 
teacher-training institution who were rated highly successful in 
their student-teaching are described below to illustrate what dif- 
ferent patterns of superior teaching are found within the same 


teacher training class: 


Jean Anderson’s average score on the Student-Teacher 
Rating Scale is 9.2 on a scale where 10 is the best possible 





* This study reports a part of a larger study conducted jointly with 
Florence L. Goodenough and Edna Olson, which is to be published separately. 
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rating. Jean is an unusually attractive student with a 
delightful personality and good taste in dress. She gets 
along well with everyone, and exercises democratic leader- 
ship. She has an average academic record in non-major 
subjects and superior marks in courses in her major depart- 
ment. She does excellent work with children, and is very 
well liked by them. She is gay and out-going. Her sug- 
gestions are usually accepted and acted upon by children. 
She seems to make decisions effortlessly, almost instinctively. 
She has art and musical talent, sings very well, and plays 
the saxophone. Her enthusiasm is an inspiration to co- 
workers. She is sincere, straightforward, coéperative, and 
dependable. She is tactful in staff relationships and exer- 
cises mature judgment in her work with children. Jean is 
self-confident and meets the challenge of teaching with no 
evidence of insecurity. Her style of teaching is highly in- 
formal and she keeps the children happily engaged in several 
activities concurrently. She contributes to a high level of 
morale on the staff. Jean attended nine out of ten student- 
teachers’ meetings, reported promptly and regularly for 
duty, and frequently offered to perform extra tasks or stay 
overtime to complete work. 


Louise Dow earns an average score of 9.7 on the Student- 
Teacher Rating Blank. Louise is a plain-looking, soft-spoken 
student. She is quiet and self-effacing, and prefers to follow 
rather than to lead. She makes suggestions timidly and her 
merits frequently are not appreciated at first acquaintance. 
Her written work is much superior to her class discussions. 
She is quick to understand children’s needs. She is con- 
scientious, firm, and consistent. She usually manages to be 
in the proper place at the proper time. She has an excellent 
knowledge of educational theory, and studies children and 
their records carefully before making decisions. Her teach- 
ing tends to be slightly formal, and she does everything in an 
orderly pre-planned way. She frequently brings in reference 
materials for group consideration. She looks for cause and 
effect factors in all situations and arrives at conclusions 
analytically. She plans to do graduate work after teaching 
two or three years. Louise reported for student-teaching 
every day, and attended all teachers’ meetings. 
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Similarly, those judged as unsuccessful in their student-teach- 
ing rarely resemble each other in any marked or systematic way. 
Elsie Williams and Mary Paul, described below, were members 
of the same graduating class as Jean Anderson and Louise Dow. 
Elsie and Mary were both rated as unsuccessful student-teachers. 
Their descriptions illustrate the contrast in their pattern of 
teaching, even though their low ratings were identical: 


Elsie Williams’ average score on the Student-Teacher Rat- 
ing Blank is 4.0. Elsie is a very large, awkward girl who 
seems to compensate for personal unattractiveness with 
aggressive and domineering behavior. She is over-stimulat- 
ing to the children and her presence overpowers and stifles 
them. She has a breezy, boastful manner. She has trans- 
ferred from one college to another three times in three 
years. She is adventuresome and a non-conformist. For 
example, she takes long hitch-hikes alone on week-ends, and 
once went to Europe on a tramp steamer. She excels in 
several sports, and has entered state competitive sport con- 
tests on several occasions. She prefers men friends to 
women friends. Her choice of clothing is bizarre and she 
makes excessive use of cosmetics. Her work lacks organiza- 
tion, and she frequently forgets duties, in fact forgets to 
report for duty. When questioned she is given to alibis. 
Elsie over-rates her own abilities and when suggestions are 
made for improving her work she is offended and claims to be 
misunderstood. She rarely attempts to use suggestions of 
supervisors or to apply knowledge gained from textbook 
study. Rather, she reacts on the basis of impulse, or even 
superstition. Her academic record is below average; she is 
usually behind in her work; her grades keep her in constant 
fear of failure. Elsie is unreliable, and has to be checked to 
see that she performs even the smallest duty. She has no 
respect for rules and regulations, and does just enough work 
to ‘get by.’ She improvises in record-keeping to the extent 
that none of the data in her teaching records can be used. 
She attended only four out of ten teachers’ meetings and on 
two of those occasions left before the meetings were over. 


Mary Paul earned an average score of 4.0 on the Student- 
Teacher Rating Blank. Mary is the ‘poor little thing’ type, 
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who generally looks as though she is about to cry. She is 
self-righteous and always justifies her decisions on the basis of 
some previous experience. Her standards and sense of per- 
spective are unusual. For example, one morning a button 
came off her suit coat while she was on duty. She left the . 
group unsupervised to go to the lounge to sew on the button 
immediately. Mary can repeat educational theory ‘parrot 
fashion,’ but never applies it to specific situations. She 
is a ‘lone wolf,’ ignored by her contemporaries, and side- 
stepped by the children. She monopolizes supervisors’ time 
discussing her work, but is indifferent to her weaknesses in 
that she ‘shrugs them off’ and does not utilize supervisors’ 
advice. ‘Then she invariably complains of her marks ‘hurt- 
ing ber feelings.’ She is nervous, flighty and highly dis- 
tractable. Mary tries hard but usually misses her mark in 
everything concerned with teaching. She can look right at a 
situation needing teacher attention and not realize that she 
should do anything. If she is told to do something, she 
does it with a vengeance to the exclusion of aJl else, and with 
great concern for the letter of the law. One day her super- 
visor asked her to ‘clean off’ one of the tables on which 
scraps of paper had been left. Mary got a pan of warm 
water, soap, and a cloth, and scrubbed the table top for 
fifteen minutes, even after the table was needed for another 
group activity; she was oblivious to everything else while 
cleaning the table. Her performance suggests a combination 
of low general ability, extreme effort, and poor judgment. 
Mary attended all teachers’ meetings, and was present every 
day at her student-teaching assignment. 


To complicate teacher selection, the problem of appraisal has 
not been solved. There are those who would argue for the salary 
criterion; those who would base success on advancement to posi- 
tions entailing greater responsibility; those who would deem suc- 
cessful the teachers who leave the classroom to administer or 
supervise; those who base their judgment upon progress made 
by the children; those who would depend entirely upon ratings 
of administrators and supervisors. None of the above criteria 


seem to tell the whole story. 
Thus, it is obvious that there is need for further careful study of 
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teacher-training and selection methods if we are to increase under- 
standing of the profession in all its aspects. Pre-selection, 
training, selection, and appraisal are to be thought of as inter- 
related and interdependent, actually as temporal phases of the 
appraisal of the same person and of the same job. It is difficult 
to develop good aptitude or interest tests for teachers when 
standards in the teaching positions themselves differ so widely. 
It is difficult to train teachers for positions where standards of 
appraisal differ so widely. It is difficult to predict teaching suc- 
cess when we do not know what factors are correlated closely 
with that success. It is difficult to rate performance when so 
many marginal duties only remotely related to classroom teach- 
ing are included in the teacher ratings used in the typical com- 
munity. Care must be taken, then, to fit the pretest program 
to the training; the training to the job; the selection to the 
job; and the appraisal to factors known to be related to successful 
teaching. 

This paper explores some methods for selecting and appraising 
those individuals who will do reasonably well in their training, 
become good teachers when judged on broad standards, and get 
along reasonably well with co-workers and cther members of the 
community. For reasons just mentioned, it would be difficult, 
and perhaps impossible, to devise a single test or other diagnostic 
instrument that wculd predict success in all areas of a field where 
the requirements are so diverse and the criteria of achievement 
are at once so vaguely formulated as to principles and in so many 


instances so specific in their details. This study represents a - 


preliminary attempt to identify some of the measures which might 
be predictive of success in teaching. If it serves to point the 
road along which further exploration may yield profitable results 
it will have served its immediate purpose. 


DESCRIPTION OF SUBJECTS 


The subjects consisted of the students who were members of 
two consecutive graduating classes in the nursery school-kinder- 
garten-primary teacher training curriculum at the University of 
Minnesota. There were one hundred eighteen students in all, 
but not all students were involved in all parts of the study, so that 
the numbers used in measurement date do not agree throughout. 
It was decided to retain all subjects even where data are incom- 
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plete, to increase reliability wherever possible. Some of the 
measures involved only one of the senior classes, while others 
utilized a merged list of the two classes. In analyzing and inter- 
preting data, the groups acting as subjects will be indicated. One 
of the graduating classes will be designated as Group A, and the 
other as Group B. In cases where the two classes have been 
merged, the resultant group will be designated as Group AB. 

The one hundred eighteen subjects were all girls ranging in age 
from nineteen to thirty-five, with a mean age of twenty-one years 
and five months. 

At the time of application for admission to the University of 
Minnesota, all of the subjects had to submit complete records of 
their grades in high school and were admitted as freshmen in the 
College of Education if their high-school average grades were in 
the twenty-fifth percentile or above among graduates of the high 
school concerned. The median percentiles on high-school grade 
averages for the group of one hundred eighteen subjects appears 
in Table I. 

Entering freshmen also take the American College Entrance 
examination* (A.C.E.) if they have not already done so in high 
school. Therefore, A. C. E. percentiles were available for all 
subjects and also appear in Table I. 


TaBLE I.—PERCENTILE RANKS ON HIGH-ScHOOL GRADES AND 
AMERICAN COLLEGE ENTRANCE EXAMINATIONS FOR GROUP 
A, Group B, ANp Group AB 

High-School Grade A. C. E. Examination 


Percentiles Percentiles 
Group N 49 53 
A Mdn. 78 44 
Range 33-100 2-87 
Group N 43 46 
B Mdn. 70 47 
Range 16-100 2-96 
Group N | 92 99 
AB Mdn. 76 46 
Range 16—100 2-96 





* Distributed by the American Council on Education, 744 Jackson Place, 
Washington, D. C. 
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The median percentile in high-school grades for all entering 
freshmen women in the College of Education for the year in which 
Group A entered was 54. Therefore, Group A with a median 
percentile of 78 in high-school grades is considerably superior to 
College of Education entering freshmen women in general. 
Some of this superiority may be due to the fact that Group A 
includes graduating seniors only, and their high-school grades 
were examined retrospectively after they earned senior standing, 
while the general College of Education norm included all students 
who entered that year regardless of whether they remained in the 
University to achieve comparable upper class status. Neverthe- 
less, it is unlikely that the difference in sampling would completely 
account for the superiority. 

The median percentile in high-school grades for entering fresh- 
men women in the College of Education for the year in which 
Group B entered was 63. Group B, therefore, with a median per- 
centile of 70, also maintains a slight advantage over other entering 
freshmen women in the College of Education. 

The median percentile for Group A on the American College 
Entrance Examination was 44. The median percentile of enter- 
ing freshmen women in the College of Education for that year 
was 34. The corresponding figure for Group B was 47 as com- 
pared to the general median of 33. 

Consequently, both groups A and B represent a fairly high level 
of ability as measured by their standings within their own schools 
in high-school grades. -They represent a fairly average group 
when compared with University of Minnesota entering freshmen 
as measured by the American College Entrance examinations. 
They exceed other College of Education freshmen women in both 
high-school grades and American College Entrance examinations. 

At the end of their sophomore year all subjects took the Sopho- 
more Culture Tests, composed of the American Council on 
Education Coéperative General Culture Test (Revised Series, 
Form P)! and Coéperative English Test (Form PM),' and the 
Miller Analogies Test (Form A).? Table 2 presents the distribu- 
tion of percentiles for all subjects on these three tests. Per- 





! Distributed by Coéperative Test Service, 15 Amsterdam Ave., New 
York, N. Y. 

2 Used locally at University of Minnesota Student Counselling Bureau. 
Not obtainable commercially. 
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centiles used are based upon the spring and fall quarter 1939 
norms for University of Minnesota Education sophomores. 
The non-academic norms were derived from tests of education 
students majoring in elementary education, physical education, 
industrial education, music education, art education, and home 
economics and agricultural education. The academic norms 
were derived from tests of education students majoring in social 
studies, English, mathematics, science, and language. 


TABLE 2.—PERCENTILE RANKS ON SOPHOMORE CULTURE TESTS 
For Group A, Group B, anp Group AB. 











Total Culture | Total English Biller 

Analogies 

Aca- Non- Aca- Non- Aca- Non- 

demic | *©*- | demic | 3° | demic | 3 
percen- demic percen- demic percen- demic 
tiles |P&TSP"| tiles [PEPE | tiles | PEeeh 

tiles tiles tiles 

Group N 39 46 50 56 51 57 
A Md 16 38 31 64.5 32 55 
Range| 1-84 | 4-98]! 6-87 | 22-97 | 4-99 | 4-100 
Group N 41 44 44 47 45 47 
B Md 32 52.5 29 57 32 55 
Range| 1-93; 1-99 | 4-97 | 11-100) 2-96 | 2-99 

Group N 80 90 94 103 96 104 
AB Md 21.5 43 29.5 64 32 55 
Range} 1-93 | 1-99} 4-97 | 11-100) 2-99; 2-100 























Thus, it may be seen that both groups earn scores on the Gen- 


eral Culture Test, the Coédperative English Test, and the Miller 
Analogies Test considerably below average (16 to 32 percentiles) 
when compared with sophomore academic majors and approxi- 
mately average (38 to 64.5 percentiles) when compared with 
sophomore non-academic majors enrolled in the University of 
Minnesota College of Education. 

As the study progressed, it seemed desirable to obtain more 
descriptive data concerning the subjects, so during their senior 
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year, Group A was given the Bell Adjustment Inventory,! and the 
Army Alpha Intelligence Test (Revised Form 5).? 

On the Bell Adjustment Inventory, scores are classified into 
five general categories descriptive of the subjects’ personal adjust- 
ment to home, social, health, and emotional factors: Very Unsatis- 
factory, Unsatisfactory, Average, Good, Excellent. Among the 
forty-three Group A seniors who took the Bell Adjustment 
Inventory, scores are classified as follows: Very Unsatisfactory, 
2; Unsatisfactory, 5; Average, 19; Good, 16; Excellent, 1. 

The Army Alpha Intelligence Test (Revised Form 5) was 
administered in order to complement the Miller Analogies data in 
obtaining a measure of the general intellectual level of the Group 
A. The Revised Form 5 has not been used widely enough as yet 
for norms to be available for comparable groups of subjects. It 
will be used here merely to supplement the general description 
of Group A. Scores for Group A ranged from 124 to 186 with a 
mean score of 156.7. Available norms for groups used thus far to 
standardize the Revised Form 5 are expressed in terms of per- 
centiles. The mean score of 156.7 obtained by Group A repre- 
sents a percentile of 97 as compared with the subjects in the 
general population who have been tested on the Revised Form 5. 
Since no student in Group A obtained a score below the eighty- 
ninth percentile, it is probable that the available norms are not 
applicable to a college group. 

Rabin and Weinik* describe the results of tests administered to 
nursing students at the New Hampshire State Hospital. For the 
ninety-two nursing students who took the Revised Form 5, the 
scores ranged from 56.187, with a mean score of 133.4. Group A 
subjects exceeded the scores for the student nurses by a con- 
siderable margin. Eighty-seven per cent of Group A seniors 
obtained scores at or above the median score of the nurses. Thus, 
the average intelligence level of Group A seniors is considerably - 
above that of the general population and above that of the student 





1 Distributed by the Publications Office, Leland Stanford Junior Uni- 
versity, California. 
? Distributed by the Psychological Corporation, 522 Fifth Ave., New 
York, N. Y. 
* Rabin, A. I, and Weinik, H. M. ‘‘The Nebraska Army Alpha and the 
Comparative Strength of Factors V, N, and R in Nursing Students.” J. 
Genet. Psychol., 34, 197-202, 1946. 
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nursing group at the New Hampshire State Hospital. On the 
basis of the Miller Analogy data previously described, the intelli- 
gence level of Group A is about the same as that of sophomore 
non-academic majors in the College of Education at the Univer- 


sity of Minnesota. 


METHOD OF JUDGING TEACHING SUCCESS 


For the purposes of this paper, success in student-teaching is 
determined by the composite judgments of those persons directly 
connected with the student-teaching course. Typically, judg- 
ments of at least six persons who had observed the student’s work 
in the classroom are included: three room-teachers under whom 
students work during their three quarters of student-teaching; the 
nursery school or kindergarten supervisor, the student-teaching 
assistant supervisor, and the student-teaching course instructor. 
Each student is rated by the above-mentioned persons on the 
Student-Teacher Rating Blank from which the following sample 
items were taken: 


STUDENT TEACHER RATING BLANK 
INSTITUTE OF CHILD WELFARE, UNIVERSITY OF MINNESOTA 


Rated by Date 


Instructions: Check each question. 
not color judgment on another. 


Name of Student Teacher 








Ability or disability on one item should 


1. How does her 
personal ap- 
pearance im- 
press you? 


2. Does she 
show initi- 
ative and 
ambition; 
ability to di- 
rect her own 
activities? 

3. Will she be 
likely to grow 
in effective- 
ness? 


No Opportunity 
| to observe 





Makes poor 
impression 


Makes average 
impression 


Always well ( ) 
groomed, shows 
excellent taste 
No Opportunity 
| to observe 





Requires prod- 
ding to get her 
to work ; shirks 
responsibility 


Works _cheer- 
fully under su- 
pervision but 
requires occa- 
sional help 


Finds things to( ) 
do without su- 
pervision and 
works on own 





Seldom seeks 
new method or 
materials 


Modifies prac- 
tices to same 
extent 


initiative 
No Opportunity 
to observe 
Constantly( ) 


studying, seek- 
ing better ma- 
terials and 
methods 








Use of Measures of Ability and General Adjustment 331 





4. Is she aware NoO i 
she 8 pportunity 
of individual | | | | | to observe 
pupil differ- Unaware of in- Shows some Keenly aware ( ) 
ences? dividual differ- awareness and of differences 


ences and in- tries to adapt and clever at 
structs all chil- instruction to individualizing 
dren similarly situation instruction 


A numerical score on the rating blank was obtained by assign- 
ing scale values from zero to ten along the horizontal lines, with 
zero representing the least desirable rating (left end of scale) and 
ten the most desirable rating (right end of scale). By averaging 
the scale positions checked, a single score was obtained on each 
rating blank for each student. The reverse side of the rating 
blank was used for a subjective characterization of the student’s 
work and for any additional pertinent information. 

In final appraisal of teaching success, the criteria included were: 


1) Average scores on the Student-Teacher Rating Blanks. 

2) ‘The characterizations on the reverse side of the rating 
blanks. These characterizations included the rater’s subjective 
description of her general opinion of the student’s work and any 
incidents which seemed illustrative of her teaching performance. 

3) Attendance at student-teacher meetings, and at student- 
teaching assignment. Students were expected to report regularly, 
unless excused because of illness. It was considered reasonable 
for students to miss one student-teacher meeting each quarter 
(eleven weeks) and to have three days of sick leave (which cor- 
responds to the amount ordinarily earned in teaching positions). 
Absences beyond these standards, unless unavoidable and 
arranged in advance, were considered indicative of irresponsibility. 

4) Participation in discussions at student-teacher meetings. 
The agenda for student-teacher meetings frequently called for 
discussion of problems encountered in the practice work. The 
extent of interest or willingness to take part in these meetings 
constituted another factor used in the final ratings. 

5) General attitude toward student-teaching experience. 
Attitudes considered as positive include such items as willingness 
to perform extra duties if necessary, extracurricular efforts such 
as out-of-school study of children’s records, and voluntary col- 
lection of materials for classroom presentation. Negative items 
include proneness to report late for duty, complaints concerning 
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travel to and from schools, and minimization of importance of 
practice work. 

6) Codéperation with instructors and supervisors. Each stu- 
dent met with instructors and supervisors several times during the 
year to discuss course work and progress in practice. A general 
rating was made of the student’s acceptance of advice and 
response to suggestions. 

7) Professional ethics and conduct. Students were judged 
upon such items as their respect for the confidential nature of 
professional records, abstinence from gossip concerning children, 
teachers, schools, or families, respect for regulations regarding 
such things as smoking on school premises, or choice of clothing, 
and straightforwardness in business affairs related to obtaining 
teaching positions. 

In order to get a single measure of teaching success the instruc- 
tor in student-teaching and her assistant examined all available 
data for each student and then each arranged the students into a 
rank order list. The average of the two rank orders assigned a 
student by the instructor and assistant was used to obtain a final 
rank order list, which was used thereafter to represent the depart- 
mental judgment as to teaching success. For sample descrip- 
tions of two students rated high and two students rated low 
according to the method described above, the reader is referred 
to the first section of this paper. 


RELATION OF STUDENT-TEACHING TO OTHER FACTORS 


Once the students were rated with regard to their success in 
student-teaching as described in the preceding section, the prob- 
lem of relating student-teaching skill to other measured factors 
was considered. Wherever possible subjects were retained to 
increase numbers, so the numbers involved in the relationships 
described in this section will differ because not all students were 
measured on each test. 

Correlation coefficients were computed to describe relationships 
between the following measures: 

1) For the combined Group AB (eighty-five students), per- 
centile ranks in high-school academic averages and rank order in 
student-teaching, the Pearsonian r was +.03. That is, no signifi- 
cant relationship seemed to exist between high-school grades and 
success in student-teaching. 





Use of Measures of Ability and General Adjustment 333 


2) For Group AB (ninety-three students), percentile ranks for 
American College Entrance examination scores and rank order in 
student-teaching, the Pearsonian r was —.06. Therefore, the 
Group AB produced no significant relationship between A. C. E. 
examinations and student-teaching success. 

3) For the combined Group AB (ninety-five students), per- 
centile ranks for Miller Analogies scores and merged rank order in 
student-teaching (where Group A and Group B rank order lists in 
student-teaching were combined into a single merged rank order 
list), the Pearsonian r was +.017. No significant relationship 
emerged, therefore, between Miller Analogies scores and student- 
teaching. 

4) For the combined Group AB (eighty-six students), per- 
centile ranks for the American Council on Education Coéperative 
General Culture Test scores and merged rank order in student- 
teaching, the Pearsonian r was +.13, indicating little relationship 
between them. 

5) For Group A and Group B (ninety students), percentile 
ranks for the American Council on Education Codéperative 
English Test (Form PM) scores and rank order in student-teach- 
ing within their own senior class, the Pearsonian r was —.10. 
Consequently, Codéperative English Test scores showed no signif- 
icant relationship to student-teaching. 

6) For Group A (fifty-three students), rank order in honor 
point ratio for grades earned in major (professional) subjects dur- 
ing their junior and senior years and rank order in student-teach- 
ing, the Pearsonian r was +.62. Thus, the first measure which 
showed any relationship which might be predictive of success in 
student-teaching was the academic average in professional sub- 
jects, which is not ordinarily obtainable until the end of the 
student’s third year in college. 

7) Students in Group A (forty-five) were asked early in April of 
their senior year (after they had completed two quarters of their 
student-teaching) to rate themselves on the Student-Teacher 
Rating Blank. These self-ratings were then related to the ratings 
of the same forty-five students made by instructors and super- 
visors. The Pearsonian r between the average self-ratings of 
Group A (forty-five students), and rank order in student-teaching 
within their senior class was +.09. This result was somewhat 
surprising because all students had individual interviews with 
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supervisors after each observation of their work, at which time 
strong and weak points were discussed frankly, and students were 
given a general idea of their standing. 

This study, then, provides further corroboration of the findings 
of other investigators in that little or no relationship was found 
between success in student-teaching and other factors such as 
high-school grades, college entrance examinations, general 
intelligence, culture test scores, personal adjustment, and self- 
evaluations. Pearsonian r’s obtained between student-teaching 
and these other factors ranged from +.13 to —.17. Undoubtedly 
a certain minimum of intelligence, linguistic skill, general informa- 
tion, personal adjustment, and academic accomplishments are 
needed prerequisites for success in the teaching field. However, 
present college entrance requirements automatically exclude 
those who fall below this minimum. Among the selected group 
who are able to meet college entrance requirements, it would 
appear that variations in other factors are of greater significance 
than variation within the purely abstract accomplishments that 
remain within this limited group. It would seem, then, that 
those persons concerned with the preservice selection of nursery 
school-kindergarten-primary teachers would have to seek criteria 
other than those investigated in this paper upon which to base 
their judgments of aptitude for teaching. 

The descriptions of the ability and general adjustment of the 
students in the University of Minnesota nursery school-kinder- 
garten-primary teacher training course have been reported in 
detail as a by-product of this study to add to the recent reports on 
the quality of students attending teacher-training institutions. 

Results also suggest that further critical examination of present 
methods of rating teachers both in training and in service is 
needed. Most supervisors agree that any completely objective 
or completely subjective method of rating teachers fails in some 
respects. Nevertheless, an increased understanding of objective 
and subjective factors in appraising teachers is needed if apprais- 
als are to have any systematic and universal interpretation. 








THE USE OF THE GOODENOUGH SPEED-OF- 
ASSOCIATION TEST IN THE PRESERVICE 
SELECTION OF NURSERY SCHOOL- 
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FLORENCE L. GOODENOUGH, ELIZABETH MECHEM FULLER, 
AND EDNA OLSON 


Institute of Child Welfare 


University of Minnesota 


The theory underlying the use of word-association tests in the 
study of personality characteristics is too well known to require 
elaboration here. The choice of stimulus words for a free associa- 
tion test is, however, an important matter which, up to the present 
time, has received far too little attention. Some years ago it 
occurred to one of the writers that unusually significant material 
in this field might be secured by confining the list of stimulus 
words to homographs—words which, in their written form, have 
two or more different derivations and consequently two or more 
different meanings. 

A relatively small amount of experimentation with about one 
hundred words of this type was sufficient to demonstrate: First, 
that the choice of a semantic category did not occur at random, 
but certain choices were far more likely to be made by some 
groups of individuals than by others. Secondly, it was found 
that there was a sufficient amount of internal consistency in 
respect to the pattern of choice made by a given individual to 
provide further evidence that such choices were not merely cir- 
cumstantial in nature but were related to basic and relatively 
stable characteristics of the personality. It was accordingly 
deemed worth while to see whether or not it would be possible 
to develop keys for scoring responses in terms of certain stated 
characteristics of personality. The general method employed 
was to compare the responses to each word of a list of 238 homo- 
graphs given by individuals who had previously been classified 
on the basis of some external criterion. The first key which 
was developed was designed to appraise the extent to which the 
thought processes of an individual as indicated by the character 
of his semantic choices conform to or depart from those usual for 
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his sex; in other words, to appraise what has sometimes been 
called ‘mental masculinity or femininity.’ The second key was 
designed to appraise leadership ability in women. The third, 
called the ‘commonality’ key indicates the extent of socialization 
as opposed to individualization of the subject’s habits of thinking. 
A preliminary report on some of the results obtained with the 
M-F key has already been published* and a more recent study in 
which later findings are presented and summarized will appear in a 
forthccming issue of Science. More complete data with respect 
to results obtained from all three keys, together with those based 
upon less formal methods of analyzing the material will be 
presented in a monograph now in preparation which is to be pub- 
lished by the University of Minnesota Press. At the moment we 
shall note only that the scores obtained by the use of each of these 
keys with relatively homogeneous groups show a high degree of 
internal consistency. The correlations between sums of alternate 
items in each case are in the neighborhood of +.90 and a similar 
figure has been obtained for retests after approximately a year’s 
interval. In the case of the M-F key, the scores show a clear 
separation between the sexes. The overlap of the male and 
female distributions is well under ten per cent. In the case of the 
leadership key, the overlap between the distribution of scores 
made by WAC officers and privates is about twelve per cent. 
The scoring of the responses, although not an entirely mechan- 
ical process, is on the whole very objective. In practically all 
cases the nature of the response ledves little doubt as to which 
of the different meanings of the stimulus word was selected by 
the subject. The word ‘kind’ for example, is used both as a 
noun and as an adjective. In the first instance the reference is to 
‘variety’ or ‘sort’ and in this case the typical responses are usually 
either synonymous abstract nouns such as ‘type’ or particulariz- 
ing phrases such as ‘of candy.’ In the second instance the 
references are equally clear. They include for the most part 
synonymous adjectives such as ‘nice’ or ‘kindly’ and nouns to 
which the adjective presumably refers as ‘friend’ or ‘mother.’ 
The success of the first keys encouraged us to hope that the 





* Goodenough, Florence L. ‘‘The Use of Free Association in Objective 
Measurement of Personality.”” Chapter 5. Studies in Personality. Con- 
tributed in Honor of Lewis M. Terman by his former students. New York: 


McGraw-Hill Co., 1942. 
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same device might be applied to the baffling task of preservice 
selection of teachers. Inasmuch as there seems to be general 
agreement that personality factors play an exceedingly important 
part in teaching success, and especially so in the teaching of young 
children, and since the association method had proved so useful 
as a means of appraising those aspects of personality which had 
previously been investigated, the experiment seemed at least 
worth trying. ' 

As originally planned, the procedure was to be essentially the 
same as that which had been employed in the derivation of the 
other keys. The test was to be administered to a reasonably 
large group of nursery-school, kindergarten, or primary-grade 
teachers whom their supervisors regarded as more than usually 
successful and to a second group of comparable size who, by the 
same criterion, were looked upen as unsuccessful. On the basis 
of the differences in the responses to the same list of words by the 
two groups, a key for predicting success in teaching young children 
was to be derived. 

The first difficulty encountered was that of locating and testing 
the unsuccessful teachers. Supervisors in general were quite will- 
ing to name teachers whom they regarded as superior, and since 
the test can be self-administered and as a rule requires only about 
fifteen to twenty minutes for its completion, practically all of 
these teachers readily responded to a mailed request to fill out the 
test blank and return it to us. But many supervisors felt that 
it would not be ethical to name teachers whom they regarded as 
unsuccessful, even though they had been assured that such infor- 
mation would be entirely confidential and that the names were 
desired only for research purposes, the nature of which was 
explained. Moreover, it was found that the percentage of 
returned blanks from this group of teachers was far smaller than 
that from those regarded as successful, a fact which reduced the 
available data to a still smaller figure and very possibly intro- 
duced a selective factor as well. Because of the wide range of 
responses to each word when free choice is permitted (see fre- 
quency lists prepared by Kent and Rosanoff) a large number 
of cases is essential for the derivation of dependable weights for 
the various responses. Otherwise it becomes impossible to score 
more than a small proportion of the responses likely to be found in 
any individual test paper, since so many of these responses will 
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not have occurred among those given by the standardization 
group with sufficient frequency to warrant assigning a score-value 
to them. It may be noted in this connection that the M-F key 
previously mentioned was based upon a standardization group 
of eight hundred cases while the leadership and commonality 
keys were standardized upon one thousand cases each. We were 
able to secure almost one hundred records from teachers regarded 
as ‘superior’ but only twenty from those classed as ‘inferior.’ 
This was clearly too few to yield dependable standards. Never- 
theless, inspection of the records suggested that differential 
trends existed. 

Because of the small number of ‘poor’ teachers for whom tests 
were available, it seemed necessary to discard our original plan 
and to make the comparison on some other basis. Also there was 
the problem arising from what obviously were very different 
standards of teaching success in different institutions. Our first 
plan, therefore, was to see what could be obtained from a study 
based entirely upon student-teachers in the nursery school- 
kindergarten-primary curriculum at the University of Minne- 
sota. All seniors and juniors majoring in this field in 1945, 
numbering one hundred two cases in all,* were given the test and 
their responses were compared with those of an equal number of 
women students in the College of Literature, Science, and the 
Arts from the same institution. From this comparison a key 
was derived according to the same general method as had been 
used in the previous studies. 

When individual papers were scored according to this key and 
the results for the sixty seniors were compared with supervisor’s 
estimates of their success in practice-teaching, the results in 
many ways seemed decidedly encouraging. With two excep- 
tions, the scores of the twenty students who were ranked highest 
as to teaching success tended to be distinctly higher than those 
earned by the twenty who were given an intermediate rank, while 
the majority of the lowest third earned scores that were decidedly 
inferior to those obtained by the other two groups. Discon- 
certingly enough, however, the two top-ranking persons, accord- 
ing to the judgments of their supervisors, had scores among the 
lowest of the entire group while the girl with the lowest rank- 





* Sixteen additional students were tested later, bringing the total number 
of students tested to one hundred eighteen. 
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position tested at approximately the 75th percentile. The 
internal consistency of the test was only moderate (+.64) when 
judged by the correlation between odd-numbered and even- 
numbered items. However, an examination of the data made it 
clear that one of the major reasons for this comparatively low 
figure, arose directly from the small number of cases used in 
deriving weights. The number of responses on the average paper. 
to which it had been possible to assign scores averaged only 
seventy-eight out of a total of two hundred thirty-eight words, 
and even for those it was recognized that many of the weights 
were unreliably determined. * 

A second plan for deriving weights in which the responses of the 
student-teachers were compared with those of the 1000 WAC’s 
used in deriving the leadership and commonality keys was then 
tried. To the one hundred two blanks from the University of 
Minnesota student majors a total of ninety-eight cases from other 
training institutions judged to be most nearly comparable to ours 
was added. The number of responses of each kind from the 
teacher group was then multiplied by 5 in order to make them 
correspond numerically to the 1000 WAC cases, and the same 
procedure as had been used in deriving the other keys was 
repeated. The internal consistency of the scores thus derived was 





* In deriving the M-F and the Leadership keys, the reliability of the Chi- 
square value was used as a criterion for weighting. A difference between 
the two criterion groups that reached the {oth of one per cent level of 
probability was given a weight of 5; the one per cent level was accorded a 
weight of 4; the two per cent level, 3; the five per cent level, 2; and the ten 
per cent level, 1. Responses for which the chances of recurrence were fewer 
than one in ten received no score. On the average paper the number of 
scored responses on each of these keys for a number of different groups 
averages about 190-195 of the 238 words. 

In standardizing the teacher’s key, the small number of cases used in 
determining the percentages made the weighting far less reliable. In gen- 
eral we attempted to awoid some measure of this difficulty by using no 
responses in which (a) the total number of responses falling in a given cate- 
gory was fewer than twenty or (b) the contrast between the two groups 
giving the response in question was less than one to two. However, these 
criteria are probably too lenient. Also there was undoubtedly some tend- 
ency on our part to define the categorical groupings in too broad terms in 
order to make the number of cases falling within a single group sufficiently 
large to permit scoring. Thus responses were thrown together which 
should have been kept separate, and this also reduced the reliability of the 
weighted scores. 
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improved over that obtained by the use of the first key (r = .81) 
but the correlation with supervisor’s judgments of teaching suc- 
cess was negligible (+.19). Again the two students who were 
regarded as the best in their practice-teaching ranked among the 
lowest in their scores on the association test. 

It was then decided to see whether or not any general trends in 
the kind of responses given by the group of teachers as opposed 
to the other groups studied could be observed; in other words, 
whether a ‘global’ as opposed to an analytical approach might at 
least provide useful suggestions. Although none of us profess 
much faith in intuition, all were nevertheless impressed by a 
‘feeling of difference’ in the pattern of responses given by the 
group of teachers which rendered us unwilling to give up the idea 
that a more effective method of scoring might yield something of 
value for the prognosis of teaching success. Two of us, therefore, 
read completely through ten of the papers of the ‘superior’ teach- 
ers from this and other institutions, immediately after which an 
equal number of those from the lower end of the distribution was 
examined in the same way. ‘The classification of the teachers as 
‘good’ or ‘poor’ was based upon the judgment of supervisors as 
indicated in a foregoing section of this paper. Each of us then 
made independent notes of the general kind of differences noted. 
As a further check, the two keys previously derived were exam- 
ined in detail for similarities or differences. 

It was assumed (a) that when a definable kind of response had 
been noted by both of the two readers or (b) occurred repeatedly 
with weight favoring teaching-success in both of the previously 
derived keys, that such responses should be regarded as possibly 
diagnostic. However, because we had, by this time, become 
convinced that a larger number of cases would be required in 
order to develop a key that would be sufficiently dependable for 
practical use, no attempt was made to carry out an elaborate 
statistical analysis of the material so obtained. Our only purpose 
was to see whether or not there seemed to be enough promise in 
the method to continue working with it. 

The first point observed was that the two readers had noted 
almost exactly the same points as favorable to teaching-success. 
Although responses falling under these heads often appeared in 
one or both keys, in a large number of cases they had previously 
received no score because, although in the aggregate they were 
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given with much greater frequency by the teachers than by the 
other groups, when the responses to each stimulus word were 
considered separately the number of individual occurrences of the 
response in question had been too few to meet the criterion for 
assigning a weight. On the other hand it was apparent that there 
were a good many types of response of possibly diagnostic signif- 
icance that had not been listed because of their limitation to a 
single stimulus word; they did not appear in other connections. 
Although it might have been well to have included at least some 
of these in the new scoring method, it was decided not to do so 
since we were interested in seeing whether or not the teachers were 
characterized by certain general types of ideas which were sufh- 
ciently pervasive to cause them to crop up repeatedly, not only in 
the usual connections but also in those not likely to occur to the 
majority of persons. With a larger number of cases these rela- 
tionships would be expected to show up statistically and in refer- 
ence to the individual stimulus words; with the number of cases 
at our disposal there were a good many instances in which the 
particular combination of stimulus and response would appear 
only once or twice but the idea would occur and recur in various 
connections. 

We therefore decided to try a method of scoring in which the 
stimulus words were disregarded except as they helped to clarify 
the idea which the subject presumably had in mind. A number 
of general categories which had been noted by the two readers 
were then defined as objectively as possible and the relative 
frequency with which these categories were referred to by teachers 
and non-teachers (Arts College students) was determined. Some 
of the categories were found not to differentiate clearly between 
the two groups, but others showed clear differentiation. An 
arbitrary weighting system based upon the total extent of the 
difference and also upon the number of different stimulus words 
eliciting the response was then developed as follows: 


Categories in which the ratio between the total number 
of responses given by teachers and non-teachers respec- 
tively was at least 5 to 1 and the number of different stim- 


ulus words eliciting the response was not less than 2 to 
1 . Weight 3. 


Categories in which the total ratio was not less than 3 to 1 
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and the number of stimulus words eliciting the response was 
greater for teachers than for non-teachers......... Weight 2. 

Categories in which the first but not the second of the 
two conditions listed above was fulfilled or those in which 
the ratio was at least 2 to 1 and the number of different 
stimulus words eliciting the response was greater for the 
| Ee eo ea eer ee re Weight 1. 


The significant categories with their weights were found to be as 
follows: 


1) All references to children or babies.............. Weight 2. 
Examples: rattle—baby; pupil—child; poor—little boy. 

2) All references to children’s activities............. Weight 3. 
Examples: block—building; skip—hop; blow—bubbles. 

3) All references to neatness..................205. Weight 1. 
Examples: put—away; pick—up; handle—carefully. 

4) All references to music...............00000005: Weight 3. 


Examples: play—piano; beat—drum; note—scale. 
5) All opposites except those with unfavorable or unpleasant 
ES St i ie doin Kad naked e kaeeen seen Weight 1. 
Examples: right—left; host—guest; round—square. 
6) All favorable adjectives referring to ability or behavior 


BOOP Fe ne ere Or. tne ye ere eee Weight 3. 
Examples: mind—bright; conduct—good; pupil—smart. 
i ESR Ee ee ane Weight 1. 
Examples: pink—green; bat—ball; orange—lemon. 
8) Active, present-tense verbs.............0.00200: Weight 4. 


Examples: yarn—knit; plane—fly; lie—tell. 
[Note: This category was included because the large number of 
cases seemed to warrant it. Although the ratio was not quite 2 
to 1 the difference between teachers and non-teachers was signifi- 
cant at almost the one per cent level of confidence.] 

9) All references to holidays, including birthdays. . ..Weight 1. 
Examples: present—Christmas; party—birthday; date— 
Thanksgiving. 

It was recognized from the beginning that the method was too 
crude to yield more than a rough indication of what might be 
secured from a more adequate standardization of the test. In 
comparing the results with other evidences of teaching ability, we 
were less concerned with ascertaining the usefulness of the 
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method in its present form than with defining its potentialities if 
standardized on a sufficient number of cases to yield reliable 
weights. Our major question was: Does the method appear 
sufficiently promising to warrant the expenditure of the additional 
time and effort needed to bring the group to a sufficient size to per- 
mit the derivation of reliable weights? The answer to this ques- 
tion obviously hinges upon the existence of a well differentiated 
pattern of semantic associations among teachers as compared to 
other professional groups, and a further differentiation between 
the associations of successful and unsuccessful teachers. 

The two hundred cases used in the development of the second 
key which was described in an earlier section of this paper were 
scored according to the method just described and the distribu- 
tions of their scores compared with those of other groups of cor- 
responding sex, age, and educational level. In making these 
comparisons the raw scores for the teachers were transformed into 
T-scores in which the mean for this group was assigned a T-score 
of 50 and the standard deviation was set at 10. By this method, 
a score of +1 SD would be designated as 60; a score of —1 SD 
would be counted as 40, etc. The range of T-scores for the 
teachers was from 30 to 73. It should be remembered in this 
connection that the teachers themselves constituted the group 
from which the T-scores were derived. 

The internal consistency of the test as indicated by the cor- 
relation between odd-numbered and even-numbered columns for 
this group was +.67. For a somewhat more representative 
group made up of Arts College students and students from the 
General College of the University, it was +.76. These figures 
are not high but they do indicate some consistency in the general 
pattern of responses; a tendency for the various categories 
included in the key to hang together.* That the scores obtained 
were not purely circumstantial in nature is indicated by a cor- 
relation of +.55 between the results of the test given to thirty of 
the members of Group A during their junior and senior years, 
respectively. The interval between two testings was approxi- 
mately ten months. This correlation was not due simply to 
identity of response on the two occasions. A count showed that 





* Correlations were also worked out between the sums of scores on the 
nine categories separately. All were positive though, as was to be expected, 
of lower magnitude than those for the total. 
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the average number of identical associations by the same sub- 
ject after the ten-month interval was only 57 out of 238 words in 
the list, or not quite twenty-five per cent. When an equal num- 
ber of papers from different subjects were paired at random it was 
found that slightly over fifteen per cent of the responses were 
identical. This is, of course, the result of the well-known fact 
that the distribution of responses to a word-association test is 
typically much skewed with a small number of associations 
accounting for the majority of all the responses given. 

The distribution of T-scores made by the teachers was com- 
pared with that of women students in the Arts College and the 
General College of the University and with a group of WAC 
officers and privates of age and educational level corresponding 
to that of the student-teachers. Although the mean T-score of 
all three of the latter groups was lower than that of the teachers, 
the chief difference between the groups was seen in the percentages 
earning very high or very low scores. These figures are sum- 
marized in Table 1. 


TABLE 1.—T-scorRE PERFORMANCES OF VARIOUS GROUPS 


T-score 60 
or above 
(in per- T-score 40 
Group N_ centages) or below 

Student-teachers at University of 

Minnesota 118 18 15 
Best third of above according to rat- 

ings on practice-teaching 39 34 14 
Lowest third 39 13 22 
‘Superior’ teachers from various 

teacher-training institutions 82 14 17 
‘Poor’ teachers named by super- 

visors in various teacher-training 

institutions* 20 5 50 
Arts College students, University of 

Minnesota 20 10 30 
General College students, University 

of Minnesota 20 15 25 
WAC’s(collegeeducation. Includes 

three former high-school teachers) 20 0 25 


*Eighteen of the twenty in this group had T-scores below 50. 
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Although the table clearly shows that some relationship exists 
between success in teaching as judged by supervisors and score on 
the association test, there are nevertheless too many exceptions to 
the general tendency to make the test in its present form a 
dependable instrument for teacher selection. For the group of 
one hundred eighteen student-teachers the correlation with the 
pooled judgments of two supervisors was +.26. With the single 
judgment of the more experienced of the two supervisors it was 
+.30.* Although both of these correlations are significant at 
better than the one per cent level of confidence, they are merely 
suggestive of a trend and are of little value for prediction. 

It is noteworthy, however, that the errors of estimate are not 
equally distributed in the two directions. It would appear from 
these data that the chances of serious error in counselling that 
would result from advising a student whose score on the associa- 
tion test is one or more standard deviations above the mean of our 
group to enter the field of primary education are not very great, 
but that a fairly large number of potentially good teachers would 
be lost to the profession if those making low scores were to be 
debarred from entrance. 

Correlations between test score and other measures tried were 
rather consistently negligible. The Bell Adjustment Inventory, 
the Army Alpha Revised Form 5, the Miller Analogies Test, the 
Sophomore Culture Test and a number of other measures of 
ability all yielded correlations that were close to zero. A low cor- 
relation of +.20 was found with honor point ratio on professional 
courses for a group of forty-five students. This falls short of the 
five per cent level of confidence. Correlations with scores on 
the M-F key, the leadership key and the commonality key of the 
association test were also too low to be significant. The average 
M-F score, however, was distinctly more ‘feminine’ than that 
earned by most young women of corresponding age and educa- 
tion. The majority of the cases earned scores within the top 
quartile of their age group. There was a suggestion that both 
exceptionally high and very low scores on the M-F scale were 





* For nearly all the subjects the two judges agreed very closely in their 
ratings, but there were a few cases where fairly marked discrepancies existed 
and in these instances those given by the more experienced supervisor 
usually approached more closely to the association test score. 
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unfavorable signs of teaching success, but more data are needed 
to establish this trend reliably. 

In summary, then, it would appear that the association test 
holds possibilities for the preservice selection of teachers, but 
that more adequate standardization upon a larger group of sub- 
jects is needed in order to develop a scoring key of sufficient 
reliability and validity to justify its practical use. However 
there is some warrant for encouraging students who, on the pres- 
ent form, earn scores as much as one standard deviation above the 
average to enter the field of early childhood education since it has 
been found that the majority of such candidates become successful 
teachers. Although unsuccessful teachers are more likely to 
earn low scores than are the successful ones, enough of the latter 
do so to make it very doubtful whether such persons should be 
actively discouraged from entering the teacher-training schools, 
particularly under the existing conditions of teacher shortage. 
In the near future we hope to be able to increase the size of our 
standardization group to a point where more dependable weights 
for the different responses can be derived. 








FACTORS INFLUENCING PERFORMANCE ON 
GROUP AND INDIVIDUAL TESTS OF 
INTELLIGENCE: II. SOCIAL 
FACILITATION 


MARY WOODS BENNETT 
Mills College 


In a previous article’ the present writer made a brief analysis of 
factors influencing performance on group and individual tests 
of intelligence. Among the factors discussed; namely, control of 
testing conditions, age of person tested, test content, rate of per- 
formance, and social facilitation, only the last was found to 
depend directly and fundamentally upon the group or individual 
situation. 

In his discussion of the general subject of suggestion, Allport 
(3, pp. 245-281) conceives of this phenomenon as the control of 
bodily attitudes in various ways. He thinks of social facilitation 
as a type of suggestion which finds its expression in release of 
attitudes already built up by previous events or in enhancement of 
responses already under way. In situations in which people are 
working together, ‘‘the sight and sound of others doing the same 
thing”’ (*, p. 261) constitute the social stimuli which set off this 
response. Allport offers no final explanation of the mechanism of 
social facilitation. He cites Meumann’s"™ suggestion that effort 
to overcome the distraction present in the group situation leads to 
greater attention to the task at hand and to increased output. 
While admitting that it may not apply to all cases, Allport thinks 
it likely that a process of circular conditioning occurs in which 
movements made during work become substitute stimuli for more 
work. The sight of similar movements made by others later 
comes to serve the same purpose. 

It is assumed that social facilitation, if it is operating as a potent 
force in the group testing situation, will reduce the influence of 
chance factors and will result in better and more consistent per- 
formance on the part of the subjects. These advantages will 
manifest themselves in increased scores on examinations adminis- 
tered under group conditions and in increased reliability and 
validity of the group test. If social facilitation affects some mem- 
bers of a group but not others, or all members in varying degree, 
its influence will still be revealed in higher group averages for 
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those who work together and in higher reliability for the group 
test, whether calculated on a retest or split-half basis, due to cor- 
related increases in scores of the subjects affected. 

The possibility that the group may have a disrupting or imped- 
ing rather than facilitating effect has not been overlooked. 
Allport? suggests that distraction, overrivalry, and emotional 
disturbance may be present, but concludes that their effects are 
relatively less important than those of social facilitation. While 
some distracting influences may be eliminated through control of 
testing conditions, it is conceivable that there are negative 
aspects of social facilitation, inherent in the group situation, 
which would lead to generally lower scores and possibly less con- 
sistent performance, depending on the nature of the impeding 
factors. 


REVIEW OF LITERATURE 


There have been several studies of the influence of the group on 
performance in various mental and motor functions. Most of 
these have attempted to separate the effect of a co-working group 
from the influence of rivalry or of a passive audience. While the 
findings of all the early work?+**:!7-!° and of some later studies! 
point toward a stimulating effect of the group upon performance, 
Dashiell’s experiments® and those reported below yield negative 
results. * 

There have been only three investigations of the effect of social 
facilitation on intelligence test score. Stimulated by Allport’s 
observations, Weston and English” attacked the problem in two 
ways, using samples of ten and twenty college students as sub- 
jects. Two tests, presumably of equal difficulty, were con- 
structed from items from Thurstone’s Reasoning Test, Roback’s 
analysis and interpretation tests, and the American Council on 
Education examination. Five subjects worked individually on 
one form of the test, all took the second form together, then the 
other five students worked on the first form of the test alone. 
Eight of the subjects did much better in the group; the differences 
between the mean scores of the tests given under the two condi- 
tions all favored the group situation and were all more than four 
times their probable errors, whether the total sample or the 





*See items 7, 18, 21 in the bibliography for further references on this 
topic. 
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separate subgroups were considered. The second approach con- 
sisted in testing twenty college students individually on parts of 
the American Council on Education intelligence examination. 
Half the students at Antioch College had taken this same exami- 
nation in groups of fifty or more, and all subjects, including those 
tested individually, had been tested in groups on an intelligence 
examination devised by L. J. O’Rourke. In view of their earlier 
results, Weston and English expected the subjects tested indi- 
vidually to show lower decile ranks on the American Council 
examination than on the O’Rourke. Very slight and insignificant 
differences were found favoring the test done in the group but the 
results were considered by the authors to be inconclusive. 

Farnsworth” presents the results of a study devised as a 
refinement of the one just considered. He used more subjects 
and paired the members of his two subgroups on the basis of 
intelligence test scores. On the assumption that the first results 
of Weston and English might have been due to differences in 
difficulty between the two test forms, he administered the same 
test to his two subsamples individually and in the group situa- 
ticn. The procedure of administration followed that of Weston 
and English, but several different tests were used and the num- 
bers of subjects taking each examination under the two conditions 
varied from twenty to thirty-six. Results are reported for the 
Ohio State University examination, both forms of the Terman 
Group Test given in quarter time, and for form B of the Otis 
Self-Administering test. Differences in mean scores for the two 
types of administration were small, inconsistent, and insignificant, 
and led the author to conclude that they showed no evidence of 
group effect. 

Krueger'! administered both forms of the Otis Self-Administer- 
ing test to one hundred sixty college students in a way calculated 
to reveal group influences. The total sample of subjects was 
divided into four smaller groups and the procedure of testing was 
so arranged that each form of the examination was given as first 
test and as second test both in the group and individually. No 
differences in score between individual tests taken before and 
after the group tests were found which could not be accounted for 
on the basis of practice. Similar results were found for group 
tests given before and after individual tests. When means of 
individual and group tests were compared, both representing the 
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first test given, the difference was of the order of .50. When the 
same comparisons were made between group and individual exam- 
inations which followed another tests, the difference was even 
smaller. The author concluded that the presence of the group 
had little effect on the Otis scores. 

Although these studies yield negative results it must be pointed 
out that only adult subjects were involved. Since Allport has 
suggested, on the basis of his own work and that of Mayer," 
Schmidt,!* and Meumann," that the effects of social facilitation 
are greater for children, there is still need for investigation of this 
factor as it influences the mental test performance of younger 


subjects. 


THE EXPERIMENT 


The sample.—The subjects were one hundred sixteen boys and 
one hundred thirty-two girls from the low seventh grades of three 
junior high schools. Half the group (hereafter referred to as the 
experimental group) belonged to the sample selected for the inten- 
sive study on adolescence then being carried on by the Institute 
of Child Welfare at the University of California. These chil- 
dren attended the same school in Oakland, California. The 
other half of the group (hereafter referred to as the control group) 
were selected from two schools in Berkeley, California. For each 
member of the experimental group a partner of the same sex in the 
same grade was found in one of the Berkeley schools who was not 
more than a month older or younger and who did not differ in IQ 
by more than 5 points.* The socio-economic status of the two 
groups was comparable. All subjects were American-born white 
and the majority were children of American-born parents. 

Procedure.—The Terman Group Test of Mental Ability was 
used for the present investigation. Each member of the experi- 
mental group was examined individually on form A and form B, 
the interval between examinations ranging from seven to twenty- 
seven days. The members of the control sample were examined 
in groups (fifty-four in one school, twenty in the other) and 
retested on the alternate Terman form after an interval of ten 





* Intelligence quotient obtained for the Kuhlmann-Anderson examination 
administered in the high sixth grade was available for all subjects. 
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days. In both experimental and control samples half the children 
had form A first, half form B.** 


RESULTS 


In relation to increase of score due to social facilitation.—Table I 
shows various distributions of measures for the experimental and 
control groups. Results were calculated separately for boys and 
girls to reveal possible sex differences in social facilitation. It 
will be seen at once that the mean scores for matched and experi- 
mental samples are closely similar on both first and second test for 
both sexes. Among boys, the control group shows somewhat 
lower standard deviations on both tests than does the experi- 
mental group. With girls, the control and experimental samples 
have approximately the same standard deviation on the first 
test, but the controls show greater variation than the experi- 
mental subjects on the second examination. There is no clear 
evidence for any average effect of social facilitation on group 
performance. 

It must be admitted that other factors beside social facilitation 
may have been operative in the situations in which these scores 
were obtained, but it is believed that their influence was reduced 
toaminimum. It might be argued, for example, that advantages 
in the group procedure due to social facilitation were offset by 
advantages in the individual situation due to better control of 
testing conditions. Every effort was made, however, to obtain 





** Because data on rate of work were being collected at the same time, 
certain alterations were made in the usual test procedure for the experimental 
group. During one sitting, subtests 3, 6, and 8 were omitted. The remain- 
ing seven subtests were administered under standard time conditions, and a 
record was made by means of a Bristol strip recorder of number of seconds 
spent on each item. During the other sitting each child took the alternative 
form of the test, including all subtests, under unlimited time conditions. 
On this occasion the subject’s progress at the end of the standard time allow- 
ance was noted, and total time spent on each subtest was taken by means of 
a stop-watch. Examinations of average point scores obtained on seven 
subtests on each form of the test when given as first and as second test and 
when administered under conditions of standard and unlimited time revealed 
no difference related to method of administration of the test. 

It was necessary to alter instructions during the individual examination 
to insure accuracy of time records and to guard against a hurried attitude on 
the one hand and dawdling on the other. Such changes were incorporated 
in instructions for the group examinations. 
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the best possible conditions with the group method. Further- 
more, the individual situation (due to the character of the test 
used) lacked certain of the features of the typical individual 
examination; the individual tests of the present study gave less 
opportunity for sociability and differed less from the group tests 
in the matter of the personal relationship between examiner and 
subject than would have been the case if the Stanford Revision 
had been used. The question might be raised as to whether or 
not rivalry was operative in either situation or in both, but it was 
assumed that this influence would be negligible and unsystematic 


TABLE I.—COMPARISONS BETWEEN DISTRIBUTIONS OF SCORES FOR 
EXPERIMENTAL AND CONTROL GROUPS ON SEVEN SUBTESTS 
OF THE TERMAN GRovuP TEST 












































; Boys Girls 
Point 
scoret =f er*| cr | Em cir} Er | cr | Er] cn 
120-129 1 
110-119 2 2 
100-109 1 3 2 + 3 2 
90-99 5 3 9 6 2 2 5 5 
80-89 8 15 13 18 8 8 7 13 
70-79 8 7 10 10 10 7 13 9 
60-F9 18 12 12 7 13 19 19 17 
50-59 6 5 3 6 17 16 9 9g 
40-49 + 7 4 5 12 5 8 7 
30-39 3 6 1 2 3 5 2 2 
20-29 2 1 1 + 2 
10-19 1 
Number 58 58 58 58 66 66 66 66 
Mean 69.2) 68.4) 74.9) 74.0) 61.1) 57.3) 67.6) 67.8 
SD 20.6) 18.8) 20.0) 17.9) 16.0) 16.5) 16.5) 18.3 
* Symbols are to be interpreted as follows: 
E experimental group I first test 
C control group II second test 


+ Total point score on subtests 1, 2, 4, 5, 7, 9, 10. 
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in view of the fact that the relative amount and quality of their 
output was not known to the subjects. '* 

In relation to increase in reliability due to social facilitation.—In 
order to investigate the reliability of the Terman test under group 
and individual conditions of administration, Spearman-Brown 
reliability coefficients, based on odd-even correlations, were 
computed for measures obtained under a variety of conditions. 
Some of the comparisons were between scores on seven subtests, 
since one of the examinations for the experimental group consisted 
of a shortened test; similar conclusions would, however, be 
expected from the use of the full test, as is indicated by the 
following correlations between score on seven subtests and score 
on ten:* 


Experimental Control 
Fe err Ss FS ae 
ies} vhkevkcadhewnnn ds .951 + .008 .981 + .003 


In Table II are presented the Spearman-Brown reliability 
coefficients for the experimental and control groups together with 
test-retest correlations. The latter represent comparisons 
between scores on the two different forms of the Terman test with 
an interval of less than one month between examinations and 
constitute a supplementary indication of the reliability of the test. 
Probable errors for the coefficients of Table II range from .015 to 
.031. No large or systematic differences in reliability are found 
for the two kinds of test procedure and it must be concluded either 
that social facilitation is not operative or that it does not serve 
to increase the reliability of a test for groups of this description. 

As a check on these findings, Spearman-Brown reliability 
coefficients were obtained from the Institute records for the Ter- 
man Group Test administered as a regular group examination to 
the members of the adolescent sample and their classmates two 





* Both of these scores were obtained for the experimental group on the 
occasion of the complete test taken under conditions of unlimited time, this 
examination being the first one for approximately half of the subjects. In 
finding the equivalent correlations for the control group, the results from 
the first or second test were used according to the one employed for a given 
partner in the experimental sample. All scores used in comparisons of the 
matched groups were, of course, obtained within standard time limits, since 
all of the tests administered to the control group were given with the regular 
time allowances. 
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years later. Each child had both forms of the test with an 
interval of about one week between examinations. Reliability 
coefficients, based on the records of one hundred sixty-four girls 
and one hundred seventy-two boys, range from .932 to .961. 
They are slightly higher than are those for the complete test in 
Table II, but present no conclusive evidence of superiority of 
group methods of administration. 


TABLE II].—SPEARMAN-BROWN RELIABILITY COEFFICIENTS AND 
TEST-RETEST CORRELATIONS ON THE TERMAN GRouP TEST 
FOR THE EXPERIMENTAL AND CONTROL GROUPS 











Boys Girls 
Coefficient 
E C E C 
Spearman-Brown 
Reliability 
Seven subtests of first test ad- 
ministered.................. .919 | .913 | .890 | .899 
Seven subtests of second test ad- 
a .894 | .888 | .902 | .903 
Complete test: first or second ad- 
I ioe rs oh oan os ms .917 | .913 | .910 | .925 
Complete test: first test admin- 
RRC Eat ea Sa .918 .939 
Complete test: second test ad- 
Se ee ce adie oi . 882 . 932 
Test-Retest 
Correlation 
Seven subtests............... .848 | .824 | .831 | .851 
Complete test............... .873 .853 

















In relation to increase in validity due to social facilitation.—No 
common criterion was available by which the relative validity of 
the Terman test administered to the control and to the experi- 
mental samples could be determined. However, standard scores 
on a variety of intelligence examinations (CAVD, Stanford- 
Binet, Kuhlmann-Anderson) and on four administrations of the 
reading and arithmetic sections of the Stanford Achievement Test 
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were available for the subjects of the experimental group. 
Furthermore eighty-four of the boys and eighty-six of the girls of 
this group took the Terman test as a group examination in their 
regular classes two years after the conclusion of the present 
experiment. Thus for eighty-four boys and eighty-six girls it was 
possible to correlate standard scores in the Terman Group Test 
when taken as an individual test, and when taken as a group test, 
with a composite of intelligence test scores and with a composite 
of achievement test scores. These coefficients, the probable 
errors of which range from .010 to .027, are presented in Table 
III. The validity of the Terman test, as determined by the 
intelligence test composite, is of the order of .80. The coefficient 
for boys tested in a group is slightly but not significantly higher. 
Validity as determined by comparison with the achievement 
criterion is definitely higher in the group situation for both boys 
and girls, but in view of the results from the correlations with the 
intelligence test criterion, it seems unlikely that the group situa- 
tion is responsible for the difference. 


TaBLE II].—VALIDITY COEFFICIENTS FOR THE TERMAN GROUP 
Test ADMINISTERED AS A GROUP TEST AND AS AN INDIVIDUAL 
TEST TO THE EXPERIMENTAL GROUP 





Grade VII Grade IX 





Validity coefficient Individual Group 





Boys | Girls | Boys | Girls 





With composite of intelligence test 


scores as criterion..............| .803 | .799 | .829 | .796 
With composite of achievement 
test scores as criterion..........| .836 | .790 | .930 | .856 

















SUMMARY AND CONCLUSIONS 


The findings from early studies on social facilitation point to 
a facilitating effect of the group on the performance of subjects 
who work with others. Those who have examined the influence 
of this factor on mental test performance find negative results in 
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the main, but no investigations have been carried out on children. 
In the present experiment the problem is attacked through com- 
parisons of scores obtained under individual and under group 
procedures by matched samples of seventh-grade children, all 
factors other than that of social facilitation being eliminated or 
reduced to a minimum. Further evidence is supplied by scores 
obtained by the same subjects tested individually and in groups 
with a two-year interval between tests given under different con- 
ditions. It was assumed that if a factor of social facilitation 
existed, its effect would be manifested in higher group averages for 
children tested together and in greater validity and reliability of 
tests administered by group procedures. The results are as 
follows: 

1) Comparison of mean scores obtained on the Terman Group 
Test by children examined individually and by those tested in 
groups shows no advantage for the latter. 

2) The reliability of the Terman Group Test is in general the 
same for children tested individually and for a matched sample 
tested as a group. This finding holds for first and second tests 
and for reliability as determined both by odd-even and by test- 
retest correlations. 

3) The validity of the Terman Group Test, administered to the 
experimental group as an individual examination and to the 
same subjects two years later as a group test, shows no change 
when other intelligence test scores are the criterion. When 
achievement test scores are the criterion there is a statistically 
insignificant increase in validity under group conditions of 
administration for both boys and girls. In view of the results 
with the other criterion, however, it seems likely that this.change 
is due to factors other than that of social facilitation. 

There is no indication from the present study that the “sight 
and sound of others doing the same thing”’ has a facilitating effect 
on the average performance of those who work in a group. The 
findings give no evidence of the operation of negative social 
facilitation, nor do they rule out the possibility that the effect of 
the co-working group may be facilitating for some of its members, 
impeding for others. It seems probable that the number of 
persons so much influenced by the conditions of group and indi- 
vidual mental testing that they do consistently better in one type 
than the other is so small as not to constitute a serious indictment 
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of either procedure. It should be borne in mind, however, that 
effect of social facilitation may vary with different kinds of work® 
and among subjects of different levels of intelligence. * 

If it had been demonstrated that social facilitation operates to 
secure better performance on group tests than on individual tests 
of intelligence it might be argued that intelligence is an essentially 
social function; that the individual examination handicaps the 
subject and yields questionable scores, at least at the age level 
under consideration. The results indicate, however, neither 
disadvantage for individual testing conditions from lack of social 
facilitation, nor advantage for group conditions because of it. 
Argument for use of either type of examination must rest on other 
grounds. 
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ASTUDY OF THE SOCIOMETRIC PROCESS AMONG 
SIXTH-GRADE CHILDREN 


MERL E. BONNEY 
North Texas State College, Denton, Texas 


One of the most significant uses of sociometric techniques is 
that of studying the choice process within a group organization. 
To what extent do those who receive the greatest number of 
choices for a particular purpose vote for those who receive the 
least number of choices and, vice versa, to what extent do those 
who are low vote for those who are high? Also, to what extent 
do those who are in the highest group vote for others who are also 
high, and likewise, to what extent do those who are in the lowest 
group vote for others who are alsolow? It is with such questions 
as these that this report is concerned. 

The subjects used for this investigation were sixth-grade pupils 
in the Demonstration School associated with the North Texas 
State College and two public schools of Denton, Texas. The 
data were gathered during the school year of 1943-1944. In the 
three schools combined there were approximately one hundred 
children. 

During the spring semester a mimeographed form was adminis- 
tered to these children asking them, first, to list the names of other 
children in their respective rooms with whom they played most 
often, and, second, to list the names of those in their respective 
rooms whom they would prefer to have on their side for a Quiz 
Kid program during which the participants would compete with 
each other in answering questions on the war and other topics. 
The directions stated that they were to list as many names as they 
wished and that they were to put them in order of preference. A 
few children listed as many as eleven names, while others listed 
only two or three. 

The score given to each child was the total number of choices 
received without regard to order of preference.* (The preference 





* A check on the data to determine what differences would result from 
assigning score values to the different degrees of preference, as compared to 
counting each choice as one vote regardless of order of preference, revealed 
that very few differences would result. In the high and low groups not 
more than one or two names in any one group in the several schools would 
have been affected by using the scaled system of scoring. Consequently, 
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factor will be included in the data at later points.) These raw 
scores varied from 0-17 for the playmate choices and 0-28 for the 
Quiz Kid choices. The range varied only three or four points in 
the different schools. This being true, it seemed legitimate to 
combine the data from the three schools. When this was done, it 
was possible to select from these three schools a high group for 
choice of playmates consisting of twenty-six children and a low 
group consisting of twenty-nine children. Each of the cor- 
responding groups for the Quiz Kid program was composed of 
twenty-six cases. 

The extent to which these high and low groups were differ- 
entiated on the basis of the choices given can be seen from the 
following figures: 

The members of the ‘high play group’ received an average of 
ten choices each while those in the low group received an average 
of only two. The pupils in the ‘high Quiz Kid group’ were given 
an average of sixteen choices while those in the opposite low group 
were given an average of only .26. These figures emphasize the 
marked differences in degree of preference showed for the respec- 
tive groups. Also, they show that the gulf between the upper 
and lower groups for the Quiz Kid program was much greater 
than that between the upper and lower groups in choices for 
playmates. From this finding it is evident that preferences for 
playmates were more widely distributed throughout the entire 
populations studied than was true for the Quiz Kid program. 
More will be made of this point in subsequent portions of this 


report. 
CHOICES OF HIGH AND LOW GROUPS FOR EACH OTHER 


We may start off with the question of how much the high group 
members chose those in the low group as play companions, and 
how much the low group members voted for the high ones. An 
analysis of the data revealed that nine of the twenty-six high 
group members chose eight of those in the low group for the play 
companions. Fourteen of the low group members chose seven- 
teen of those most popular as playmates, and the great majority 
of these choices were first, second, or third in order of preference. 
Only nine in the high group received no attention from those in the 





the unscaled system was used, since it was simpler and appeared to serve the 
purpose equally well. 
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lowest category, asshownin TableI. From this table it can also 
be seen that the interchange of votes between the high and low 
groups resulted in twelve mutual attachments. 


TaBLE I.—ExtTEent oF MuTUAL AND UNRECIPROCATED CHOICES 
BETWEEN THE H1GH AND Low Groups IN THEIR VOTING FOR 
Eacu OTHER FOR PLAY COMPANIONS 





Number Receiving No 


. t 4 ° 
Unreciprocated Choices Votes from Opposite Group 





Mutual 
Choices| High Un- is 


_ 
rec.* Low Hish Group 


High Re- Low Re- 
ceiving None | ceiving None 





Group from Low | from High 
12 33 : 0 9 21 
Pairs Pairs Pairs (35%) (72%) 

















* Unreciprocating. 


Further examination of Table I reveals that much of the respon- 
siveness of the low group children for those at the top was defi- 
nitely on a one-way street. This is shown by the thirty-three 
unreciprocations received by the low group pupils, and by the fact 
that twenty-one (or seventy-two per cent) of their twenty-nine 
members did not draw a single vote from their more favored 
classmates. By contrast none of the choices given by high group 
children to those in the low group was unreciprocated. 

The implications of the above data can best be discussed along 
with the findings on the voting for the Quiz Kid program. There- 
fore, we shall turn our attention to this aspect of the data. 


CHOICES OF HIGH AND LOW GROUPS FOR EACH OTHER FOR THE QUIZ 
KID PROGRAM 


Only one high group member voted for a low group child for 
the Quiz Kid program. This choice was reciprocated, resulting 
in one mutual pair, as shown in Table II. Every child in the low 
category chose one or more individuals in the upper group, but 
twenty-five of the twenty-six low group children received no 
votes whatever from the top ranking individuals. 
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TABLE II.—ExtTEent oF MuTUAL AND UNRECIPROCATED CHOICES 
BETWEEN THE HIGH AND Low Groups IN THEIR VOTING FOR 
EacuH OTHER FOR THE Quiz Kip PROGRAM 











Number Receiving No 
Unreciprocated Choices | Votes From the Opposite 
Group 
Mutual 
Choices High Un- «| High Re- Low Re- 
Low Unrec. ae mie 
rec.* Low Hich Group | °°!¥128 None | ceiving None 
Group s P| from Low | from High 
1 88 0 l 25 
Pair Pairs Pairs (3.8%) (96 %) 

















* Unreciprocating 


The results of this one-way choosing is vividly shown in the 
fact that the low group members amassed a total of eighty-eight 
unreciprocations from the top group. This is more than two 
and one-half times as many unreciprocations as the low group 
received from the high group in the choosing of playmates. 

What practical implications arise from the findings just 
presented? Primarily, the implications center on the greater 
socializing value of play as compared with an activity involving 
knowledge and academic skill, such as the Quiz Kid program. 
This is shown in the fact that there was a greater distribution of 
choices throughout the entire group in the choosing of playmates; 
it is shown in the much larger number of mutual attachments 
between the high and low groups (twelve to one) in the voting for 
play companions than in the voting for the Quiz Kid program; 
and it is further emphasized in the much greater number of 
unreciprocations given the low group members in their voting for 
high group members for the Quiz Kid program than in their voting 
for choice of playmates. 

These findings show that some children who are far apart on 
a scale of general group desirability as play associates, may never- 
theless play with each other and apparently with some degree 
of satisfaction since each one accepts the other in this capacity. 
On the other hand, those who are lowest in total group estimation 
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from the standpoint of their acceptance as a co-worker in an 
activity involving knowledge, are almost completely rejected by 
those who are rated highest in this regard. It would seem, then, 
that one of the best ways for schools to promote inter-personal 
relations between those children who are very well accepted on the 
one hand, and those who are poorly accepted on the other, would 
be through a comprehensive and well directed play program. 
This would naturally be of greatest aid to those who are low, since 
those high in social acceptance would have plenty of satisfying 
relationship without close contacts with those who are low. 
However, it must be assumed that there would be mutual advan- 
tages for social development between the individuals concerned. 

As will be recalled from the description of methods used in 
this investigation, the children were asked to list their choices 
in the order of preference, beginning with number one as the most 
preferred individual. As would be anticipated, the choices for 
play companions given by high group members to those in the low 
group were not among the highest in order of preference. Only 
one was as high as second choice, the others ranging from fourth 
place to eleventh place. The average preference order was sixth 
place. However, considering the extreme nature of the groups 
compared, even this amount of positive acceptance is a social gain 
for those who are least desired as play associates. It is definitely 
more than the low groups received from the highest groups when 
the basis of selection involved knowledge and academic skills. 

The point may be made here that one of the best ways to 
help a child (or an adult) who is low in social acceptance would 
be to teach him to play various games, since it is through such 
activities that he is most likely to establish some degree of accept- 
ance with those who are already socially preferred individuals. 
It is worth noting in this connection that one of the most generally 
adopted methods of rehibilitating maladjusted individuals is 
through the use of play programs. 

A significant aspect of the present data is the extent to which 
those who were high or low in the choices for playmates were also 
high or low in the choices for the Quiz Kid program. An analysis 
of the data on this point revealed that sixty-one per cent of those 
in the high group on choices for playmates were also in the high 
group in choices for the Quiz Kid program. Also, fifty-two per 
cent of those in the low group in choices for playmates were also 
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in the low group in choices for the Quiz Kid program. Three 
of those in the low play group were in the high Quiz Kid group, 
but none of those in the low Quiz Kid group were in the high play 
group. The three children who jumped from one extreme to the 
other in the two measurements were cases of academically 
capable children who were poor on the playground and possessed 
~ objectionable personal traits. The fact that there were three 
such cases in the small population studied shows how a child may 
possess academic prestige and yet be rejected for a relationship 
which is primarily social. This point emphasizes to parents and 
teachers the need of considering a child’s acceptance in play as a 
measure of his social acceptance and personality adjustment as 
well as whatever réle he may play as a leader. 

If there be some who consider popularity as a playmate as a 
rather superficial form of social recognition, attention is called to 
the finding given above which showed that sixty-one per cent of 
those receiving the highest number of choices as playmates were 
also accorded the highest degree of academic prestige as measured 
by the choices for the Quiz Kid program. Furthermore, none 
of those in the high play group were in the lowest group in aca- 
demic prestige, whereas, a little more than half of those in the 
lowest play group were also in the lowest group in the voting for 
the Quiz Kid program. In other words, judging by the evidence 
of this study, high popularity as a play companion is much more 
likely to be associated with other forms of classroom prestige 
than the reverse. 

However, when total groups in this study were compared 
(including ninety-nine cases) by means of the correlation tech- 
nique, the Pearson r proved to be only .42 + .08. This may be 
considered a fair degree of association, but it is not sufficiently 
high to afford safe grounds for generalizing from one kind of 
acceptance to the other. This relatively low degree of general 
relationship between the two kinds of measurements emphasizes 
the point previously made; namely, that academic prestige is no 
guarantee of superior acceptance in a social relationship. 


EXTENT TO WHICH THE EXTREME GROUPS CHOSE OTHERS IN THEIR 
OWN GROUPS 


We may now turn our attention to the question of the extent to 
which the high group members and the low group members voted 





A Study of the Sociometric Process 


for others in their own particular groups. 
from the standpoint of the process of group identification. 

Let us first consider the findings on choices of playmates. It is 
evident from Table III that the high group members voted for 
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This will be significant 


TABLE IJI.—ExtTent To WuicH THE HIGH AND Low Groups 
VoTrep FoR THEIR Own Group MEMBERS FOR PLAYMATES 














Number Receiving 
’ Unreciprocated No Choices from 
Mutual Choices Choices Their Own Group 
Members 
High Low High Low High Low 
Gr. Gr. Gr. Gr. Gr. Gr. 
38 8 19 12 0 9 
Pairs Pairs Pairs Pairs (31%) 
l 

















themselves to a far greater extent than the low group members 
voted for themselves. All the top ranking individuals voted for 
one or more of their group, and from this interchange of choices, 
thirty-eight mutual attachments resulted. By contrast, nine of 
the low group children received no choices at all from their own 
membership. Those who did vote for each other established 
eight mutual attachments. It will be noted that the number of 
mutual attachments among the upper group is almost five times 
as great as found among the low group, in spite of the fact that 
there were three more individuals (twenty-nine compared to 
twenty-six) in the low group available to give choices to each 
other. This means, on the part of the high group members, a 
much greater degree of mutual identification with each other, and 
more in-group feelings. 

It is significant to note that the low group individuals actually 
succeeded in establishing more mutual ties with members of the 
high group than they did with others in their own group. There 
were twelve with the high group as compared with eight for their 
own. What is the explanation of this unexpected finding? 

There are at least two factors which would enter into this 


result. 








366 The Journal of Educutional Psychology 


First, the low group members gave more choices (forty-two as 
contrasted to twenty-seven) to those in the high group than they 
gave to their own membership. This giving of more choices 
would naturally increase the chances of reciprocation if the high 
group members voted for the low ones at all. 

Second, it is known that some children who stand high in group 
acceptance are characterized by generous and humanitarian atti- 
tudes which cause them to include in the orbit of their social 
interests individuals who are generally low in group acceptance. 
It is true that this inclusion seldom, if ever, represents a high 
order of preference (as reported above for this study), but it is 
nevertheless something, and it is frequently enough to mean a 
good deal to those who are on the lower rungs of social status. 
It is enough to cause some of them to designate as their ‘best 
friend’ a child who is very high in social status when, as a matter 
of fact, such choices represent more wishful thinking than concrete 
realization. . 

Additional data in Table III which might cause some surprise 
are the finding that there were more unreciprocated pairs among 
the high group members than among the low ones. The explana- 
tion of this result is found in the number of choices given. 
Whereas those in the top group gave each other a total of ninety- 
four votes, those in the low group gave each other a total of only 
twenty-seven. This large difference in responsiveness to each 
other would obviously increase the chances for unreciprocation 
among the high group members as well as producing many more 
mutual acceptances. The point should not be overlooked that 
the extent of unreciprocation among those in the low group was 
greater than the extent of mutual acceptance. 

Turning now to the data on choices for the Quiz Kid program, 
it will be apparent from Table IV that all the high group members 
received votes from others in their particular group, and that 
from this interchange of votes twenty-three mutual and forty-five 
unreciprocated choices resulted. When these results are com- 
pared with those obtained for play companions, it becomes evi- 
dent that the amount of positive inter-personal acceptance among 
those most favored as playmates is much greater than is found 
between those most favored as partners in the Quiz Kid program. 
There were fifteen more pairs (38-23) of mutual attachments 
among those most popular as playmates, and at the same time 
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TasBLeE IV.—ExtTEentT TO WHICH THE HIGH AND Low Groups 
VoTED FoR THEIR Own GRouP MEMBERS FOR THE Quiz Kip 











PROGRAM 

Number Receiving 

. Unreciprocated No Choices from 

Mutual Cacices Choices Their Own Group 

Members 

High Low High Low High Low 
Gr. Gr. Gr. Gr. Gr. Gr. 
23 0 45 l 0 25 




















twenty-six fewer unreciprocations (45-19) as compared with the 
top group for the Quiz Kid program. A few more unreciproca- 
tions might be expected among the Quiz Kid subjects since their 
high group gave each other a total of one hundred eleven choices 
as compared with a total of ninety-four given by the top play 
group to each other—the number of cases being the same in both 
instances. However, by the same token, there should have been 
a few more mutual ties among the members of the high Quiz Kid 
group, but the reverse was found to be true. 

This evidence supports the point made in the preceding section 
in regard to play activities providing a much better medium for 
establishing inter-personal relationships than is provided by those 
activities involving knowledge and skill. It is quite likely that an 
element of jealousy or rivalry enters into the relationship between 
individuals who are competing with each other for academic suc- 
cess and prestige to a greater extent than is true when play 
activities are involved. Whether or not this is the correct inter- 
pretation of the above evidence, the fact is clear that there was a 
considerably greater number of mutual attachments, and con- 
siderably fewer unreciprocations, among the members of the 
highest popularity group in choices for play companions than in 
the corresponding high group in choices for the Quiz Kid program. 

When attention is turned to the voting of the low group mem- 
bers for each other for the Quiz Kid program, it is evident at 
once from Table IV that there was practically no acceptance on 
the part of these individuals for each other for this activity. 
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As a matter of fact, only one of them voted for another one in his 
own group and this choice was unreciprocated. Naturally, this 
meant no mutual preferences whatever. 

The fact that the low group members designated each other 
for playmates to a much greater extent than they chose each other 
as partners for the Quiz Kid program gives added emphasis to the 
point previously discussed in regard to play activities offering 
many more opportunities for establishing inter-personal ties than 
do those functions involving knowledge and skill. Equally 
important, however, if not more so, is the finding that in both 
kinds of choosing situations the low group members showed very 
little identification with each other. Of course, those who were 
actually low in the kind of abilities required to do well in a Quiz 
Kid program could not be blamed for voting for others as possible 
partners for themselves who were known to possess such abilities. 
In fact, it was the smart thing to do. However, this does not 
change the fact of lack of respect or admiration among the low 
group members for each other. 

The data just presented on the choices of the low group mem- 
bers is factual evidence in support of the psychological theory in 
regard to the process of identification going up a group scale of 
prestige rather than down. 

Another aspect of the present data bears on the matter of 
emotional expansiveness,* i.e., the number of choices which each 
child gave to all other members of his particular class. The 
findings on this point are significant in that they are related to the 
question of degree of positive interest in others. It may be 
assumed that those children who wrote down the largest number 
of names in the two choosing situations were showing a greater 
amount of out-going responsiveness, or emotional expansiveness, 
toward others than was shown by those who wrote down only a 
few names. Proceeding from this assumption the question may 
be asked whether or not those who received the greatest number 
of choices from others were characterized by more emotional 
expansiveness than were those who were less favored by their 
associates. ‘Table V gives the answer to this question for the 
high, middle, and low groups. 





* This term is borrowed from Helen H. Jennings’ Leadership and Isolation, 
a volume which presents a pioneering account of the operation of the choice 


process. 
( 
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TABLE V.—AVERAGE NUMBER OF CHOICES OF OTHER CHILDREN 
MADE BY DIFFERENT GROUPS IN LEVELS OF SociAL ACCEPTANCE 





High Groups Middle Groups Low Groups 





Play- Quiz Play- Quiz Play- Quiz 
mates Kids mates Kids mates Kids 





7.0 7.1 5.8 6.2 5.0 5.7 




















It will be evident from the data of Table V that the highest 
groups from the standpoint of both playmate and Quiz Kid 
choices were characterized by more emotional expansiveness than 
either of the two lower groups. There was very little differ- 
ence between the middle group and the lowest group on either 
measurement. 

In order to determine the statistical significance of the dif- 
ferences between the means of the high groups and the means of 
the low groups, as given in Table V, the formula for this purpose 
was worked out. The results showed that the critical ratio for 
the difference between the means of the high and low groups in 
the choosing of playmates was 1.43. Thesame calculation for the 
high and low Quiz Kid groups produced a critical ratio of 2.60. 
Neither of these ratios reaches the highest degree of statistical 
reliability, although the second one is close to the standard of 3. 

Probably the safest conclusion to reach is that the poor accept- 
ance accorded to the two low groups was not due to lack of desire 
on their part for acceptance by others, but rather to their lack of 
resources to arouse responsiveness toward themselves. 

It is impossible to say whether the greater amount of outgoing 
responsiveness actually registered by the high group members was 
a factor in determining their superior group acceptance, or 
whether their superior acceptance was a factor in causing them to 
develop responsive dispositions. No doubt there was a reciprocal 
relationship between the two. 

Finally, attention may be turned to the question of the choice 
process among the middle group members. In the interest of 
Saving space in this report, the data for this middle group were 
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not intensively analyzed for presentation. Furthermore, it was 
felt that the data for this group, if presented, would not add any- 
thing of significance over that already given. Enough checking 
on the middle-group choices was done to discover that, like those 
in the low group, there were a good many more unreciprocated 
than mutual choices for each other among the middle group’s 
own membership. Only the high group members showed an 
excess of mutual over unreciprocated choices. As would be 
expected, the middle group members showed more identification 
with themselves than did the low group individuals, but primarily 
their choices went to those in the top ranking groups in their 
choosing for both playmates and the Quiz Kid program. The 
process of identification (especially when preference and not just 
toleration is involved) does not run from the bottom of a group to 
the top with off-shoots evenly distributed all along the way. 
Rather there is a heavy skewing of the emotional expansiveness of 
the lower and middle groups toward those in the top brackets. 


CONCLUDING REMARKS 


Although the number of subjects in this study is small, and 
too small to warrant intensive statistical analysis, it seems proba- 
ble that the findings do apply to larger populations of a similar 
nature. They are in accord with the results reported by H. S. 
Jennings on a population of one hundred thirty three subjects in 
the New York State Training School for Girls.* Also, they are 
in accord with psychological theory on the structure of groups, 
and in respect to play as a socializing agency. 

Furthermore, the lack of identification of the lower group mem- 
bers with themselves is corroborated by observations on much 
larger social groups such as Negroes, and the economically 
impoverished white people. 

If the question be asked how those in positions of low prestige 
can be helped to attain a greater identification with others in their 
class, the primary answer would seem to be to help them to help 
themselves toward greater personal and group achievements of 
which they, as well as others, may be proud. This is the function 





* Ibid. 
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of schools, churches, clubs, and other community agencies, par- 
ticularly when children are concerned. It seems evident, too, 
that any kind of organization, whether on the childhood or adult 
level, which is capable of arousing a strong feeling of group loyalty, 
or which provides its members with a realization of power, such as 
labor unions, is an meg factor in producing an in-group 
identification. — 2 

It has been demonstrated and emphasized theoretically that 
no society, or subgroup within that society, can prosper to the 
fullest extent either materjaily or spiritually, except that all 
its members prosper and thereby contribute to the welfare of the 
whole. Therefore, it should be the aim of educators and social 
psychologists to promote practices which will enable individuals 
in all categories of social status to inspire some degree of admira- 
tion from their respective peers and to establish inter-personal 
bonds between them. As this is done there will be a greater 
degree of acceptance among all members of a group regardless of 
original social status. 
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THE RELATIONSHIP BETWEEN PERCEPTUAL 
SPAN AND RATE OF READING! 


JEAN SUTHERLAND 
Butler University 


PURPOSE OF THE EXPERIMENTS 


The evidence regarding the réle of perceptual span in reading is 
not clear-cut. There are individual differences in perceptual 
span and it seems fairly clear that these are related to reading 
ability. However, we need further evidence before we can say 
what effect training in perceptual span will have on reading. 

Therefore, the purpose of this investigation was to determine 
(a) the relationship between perceptual span and rate of reading 
and (as a subsidiary objective) the relationship between per- 
ceptual span and rate of perception; and (b) the effect of syste- 
matic training in perceptual span upon rate of reading and upon 
improvability of rate of reading. 


EXPERIMENT A. THE DEGREE OF RELATIONSHIP BETWEEN 
PERCEPTUAL SPAN AND RATE OF READING 


Procedure.—In order to determine the degree of relationship 
between perceptual span and rate of reading, rate of reading and 
perceptual span were determined for a group of subjects and the 
coefficients of correlation between the two sets of scores were 
computed. The subjects were one hundred twenty-five univer- 
sity students most of whom were enrolled in the introductory 


course in general psychology. 
The tests used to measure rate of reading and rate of perception 


were the following: 
1) Minnesota Speed of Reading Test for College Students 
2) Blommers’ Rate of Comprehension of Reading? 


1A digest of a dissertation submitted in partial fulfillment of the require- 
ments for the Degree of Doctor of Philosophy in the College of Education, 
in the Graduate College of the State University of Iowa. (July, 1945.) 
The writer is indebted to Dr. James B. Stroud for his assistance in planning 
and conducting this research. 

? Paul J. Blommers, “Rate of Comprehension of Reading: Its Measure- 
ment and Its Relation to Comprehension,” Journal of Educational Psy- 
chology, 1944, Vol. 35, pp. 449-473. 
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3) An unstandardized test from the Wilking-Webster Read- 
ing Manual! 
4. Stroud’s Rate of Visual Perception Test? 

Perceptual span was measured by means of a pendulum-type 
tachistoscope. The test required from thirty-five to sixty min- 
utes to administer. It involved a maximum of two hundred 
thirty-three separate words and phrases of varying lengths 
presented to each subject individually. The words and phrases 
ranged in length from three to eighteen spaces. They were 
selected from textbooks and newspapers and were all familiar in 
meaning. The following are examples: 


among swamped with work 

auction delicious beverage 

sunny room mountainous regions 

dry weather entirely independent 

at a glance the intelligent woman 
amusement park the best living conditions 

from the context an active and growing enterprise 


Each word or phrase was printed on plain white gummed paper 
in twelve-point Garamond type and then pasted on white cards 
514” by 544”. The cards were then arranged according to the 
length of the words or phrases and were presented to the subjects 
in that order. The test for each subject continued until he failed 
five consecutive trials. The tachistoscope was adapted to the 
experiment as follows: Covering the pendulum and attached 
to it was a white cardboard screen with a small aperture (3’’ by 
5’’) in the center, of such a length as to expose the material for 
100 ms. As the pendulum with the screen attached swung from 
the subject’s left to his right side, the word- or phrase-card placed 
in a rack behind the screen and at the bottom of the arc was 
exposed. The length of aperture required for the desired exposure 
time was determined in the following way: A beam of light was 
cast upon a sensitive film attached to the screen of the pendulum. 
Between the film and the light was a fan driven by a synchronous 
motor at such a rate as to interrupt the beam of light every 5 ms. 





1§. V. Wilking and’ R. G. Webster, A College Developmental Reading 


Manual, Houghton Mifflin Company, 1943, pp. 53-54, 72-73. 
2 J. B. Stroud, “Rate of Visual Perception as a Factor in Rate of Read- 


ing,” Journal of Educational Psychology, 1945, Vol. 46, pp. 487-488. 
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The length of film required for twenty such interruptions was 
taken as the aperture required for an exposure of 100 ms. The 
time was set at 100 ms. in order to preclude the subject’s making 
more than one fixation. 

In front of the entire apparatus was a gray screen with a 
rectangular opening at which the stimulus words appeared. The 
purpose of the gray screen was to hide all the equipment from the 
subjects’ view, so that they would not be aware of the swinging 
pendulum except at the moment of exposure. 

Light fell on the exposed material from two windows behind the 
subject. In addition a 75-watt bulb was placed below the expo- 
sure card behind the gray screen. 

The tachistoscope was operated by means of a telegraph key 
which the subject pressed with his left hand. He watched the 
aperture for the exposed print and read the word or phrase aloud. 
The pendulum was swung back to its original position by the 
experimenter and a new card was inserted. This process was con- 
tinued for the entire test. Although the exposure time was con- 
stant, the subject could regulate, more or less, the speed at which 
he worked since he controlled the telegraph key. The aperture in 
the screen concealing the tachistoscope was approximately at the 
eye-level of the subject. The subject was allowed to adjust the 
distance between himself and the screen to meet his individual 
needs, in this way approximating ordinary reading conditions. 

The experimenter described the procedure to the subjects in 
something like the following terms: ‘ 

“You are to be given a test of perceptual span. The 
purpose of the test is to find out how many words you 
can read in 100 ms.—that is, in one glance. The word or 
phrase will appear in this opening after you press this 
key. You are to tell me what you read. Then I shall 
change the word-card and you may read the next one, 
and so on. The first five cards are trial cards and will 
not count in your final score. You may practice on 
them until you can read them correctly.” 


The subject’s score on the perceptual span test was determined 
by counting the number of cards correctly read. A card was 
not counted unless it was read exactly as printed. 
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Results.—Coefficients of correlation computed between scores 
on the perceptual span test and on each of the reading rate tests 
described above ranged from .31 to .37. The correlation between 
perceptual span and the rate of perception test was .70. 

These correlations seem to be large enough to suggest the 
hypothesis that training calculated to improve perceptual span 
will also improve rate of reading and rate of perception. In order 
to test this hypothesis a training program (Experiment B) was set 
up to give students training in perceptual span. 


EXPERIMENT B. THE EFFECT OF TRAINING IN PERCEPTUAL SPAN 
ON RATE OF READING AND ON RATE OF PERCEPTION 


Procedure.—The purpose of Experiment B was to answer the 
following questions: Can perceptual span be increased by train- 
ing? Will training in perceptual span improve rate of reading 
and rate of perception? Do students who have been given 
perceptual span training profit more from other training in read- 
ing subsequently administered than students who at the beginning 
of this training had not had the perceptual span training? 

Three groups of college freshmen were used in the investigation. 
All groups were drawn randomly from those freshmen enrolled 
in the Liberal Arts College at the State University of Iowa who 
scored below the mean in rate of reading on the entrance tests. 
In Group I there were thirty-five subjects; in Group II, forty- 
three subjects, and Group III, forty-one subjects. 

Each of the thirty-five students in Group I was given twelve 
training sessions of fifteen minutes each, extending over a period of 
three weeks, in perceptual span. About 2000 word- and phase- 
cards were presented tachistoscopically in the same manner as 
described in Experiment A. During this training period the 
students were encouraged and urged to try to improve their per- 
ceptual span—that is, to increase the length of phrases that could 
be reproduced following 100 ms. exposures. The length of the 
word or phrase to be read was increased gradually as the student’s 
perceptual span increased, so that it was always slightly beyond 
the level which the student could achieve regularly. 

The three reading tests and the rate of perception test used in 
Experiment A were given to the students in the -experimental 
group at the beginning and immediately after this training period. 
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These test scores were used for purposes of assessing the effect of 
training in perceptual span upon rate of reading. 

The subjects of Group II formed a control group. To them 
were administered the Minnesota Rate of Reading Test and the 
Blommers Rate of Comprehension of Reading Test without the 
intervening training in perceptual span and without intervening 
training in reading. Alternate forms of the tests were admin- 
istered to the subjects at the end of the period. The forms were 
counterbalanced. 

The subjects who comprised Group III were given the Michigan 
Speed of Reading Test and the lowa Silent Reading Test, and 
were then enrolled in regular sections of the freshman reading 
class. These sections met four days a week, one hour each, for 
one-half semester. The sections were taught by the regular 
reading instructors. The instruction, largely group, centered 
chiefly around the Harvard Reading Training Films and the Wilk- 
ing and Webster College Developmental Reading Manual. Alter- 
nate forms of the speed of reading tests were administered at the 
conclusion of the training. 

As noted earlier, one of the purposes of Experiment B was to 
determine for a given group of subjects the effect of initial training 
in perceptual span upon rate of subsequent improvement in rate 
of reading in a special training program as compared with a com- 
parable group that had not had training in perceptual span. 
Groups I and III are used for this purpose. 

Following the schedule of training in perceptual span described 
above, the subjects of Group I were transferred to the class in 
reading. Two sections made up entirely of the subjects of Group 
I were formed and given reading instruction in the same manner 
as the subjects of Group III. The subjects of Group I, as those of 
Group III, were tested on the Iowa Silent Reading Test and the 
Michigan Speed of Reading Test before and immediately after 
participating in the reading program. 

Prior to the training in perceptual span, tests were given in 
reading and in rate of perception. Alternate forms of the reading 
test were administered at the conclusion of the training in per- 
ceptual span. The rate of perception test was repeated, since 
there was no alternate form. The name of the tests and mean 
scores before and after training are shown in Table I. All of the 
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differences between the means are significant at the one-per-cent 
level. 


TABLE I.—DIFFERENCE BETWEEN MEANS ON TEsTs GIVEN TO 
EXPERIMENTAL Group BEFORE AND AFTER THE 
PERCEPTUAL SPAN TRAINING 


Mean Mean 


Tests Before After 

Training Training 

Minnesota Speed of Reading............. 15.68 19.78 

Stroud’s Rate of Perception............. 157.75 174.95 
Blommers’ Rate of Comprehension of 

ET Ae en ae ee 42.38 54.67 
Informal Test from Wilking-Webster Man- 

NR apts te alae 6 Lad a aakn din era iis 270.01 399.90 


It seems probable that some of the gains shown in Table I are 
due to factors other than training in perceptual span. For one 
thing, there is a practice effect; for another, there may be a 
regression effect. The Minnesota and the Blommers tests were 
administered to a control group and readministered, in alternate 
form, one-half semester later, as in the case of the experimental 
group. It turned out that the difference between the initial and 
final mean of the control group on the Blommers test was signifi- 
cant at the one-per-cent level, and on the Minnesota test, at the 
ten-per-cent level. However, differences in amount of gain as 
betweer the experimental group and the control group were 


TABLE I].—CoMPARISONS ON Groups I AND III on READING 
Scores MADE BEFORE AND AFTER A COURSE IN READING 
INSTRUCTION 





Group I Group III 





Tests nt : Ba ; 
Initial | Final | Initial | Final 


Mean | Mean | Mean! Mean 





Iowa Silent Reading (Rate 
CE a aalake dw) 4 eee es 92.20 | 109.78 | 85 106.98 
ES 5c > CER K teehee sce 47.62 | 53.11 | 47.00 | 57.13 
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significant at the ten-per-cent level on the Bloomers test and at 
the one-per-cent level on the Minnesota test. 

Subsequent to the termination of perceptual span training the 
experimental group, Group I, was placed in regular reading 


Words 

Per 

Minute 
600 
575 
550 
525 
500 
475 
450 
425 


400 





375 


Group III 


350 
325 
500 
275 
250 
123864 s& 67 8 9 WW 121 12 18 14 


Tests 
Fie. 1. The Daily Average Rate Scores for Group I and Group III. 


classes, as described previously. The progress of this group was 
compared with that of Group III, a comparable group, receiving 
comparable reading instruction, the only difference in method 
being that the subjects of Group III had not had prior training in 
perceptual span. Both groups made gains on the two tests given 
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(Table II) significant at the one-per-cent level. However, the 
gains of Group III exceeded those of Group I. 

Accompanying the Reading Training Films is a series of reading 
selections equated in content with the reading matter presented 
in the films. These equated reading selections were administered 
at a rate of about two per week throughout the period of reading 
instruction. Progress of the two groups is shown graphically 
(Figure 1) in words read per minute. It is seen that on these 
informal reading tests the group that had had basic training in 
perceptual span made more rapid initial progress, although this 
superiority was largely lost at the end of the period. It should 
also be noted that these curves represent progress in speed in a 
very narrow function, with a high degree of continuity in mental 
set from selection to selection and a high degree of similarity in 
material. It should not be understood that these large gains 
represent gains that extend to reading in general. 

It should be noted that the subjects of Group I and those of 
Group III comprised different sections of the class in reading. It 
appears, from Figure 1, that Group I must have lost interest 
before the instruction was terminated. If this is the case, it 
could be due to the relatively long period of instruction to which 
this group was subjected or to certain unfavorable conditions of 
instruction peculiar to the sections which the subjects of Group 
I comprised. Figure 1 suggests that Group I made faster initial 
progress than Group III. In any event, it must be admitted 
that the question of the effect of training in perceptual span upon 
progress in reading instruction is left by these results without a 


decisive answer. 


SUMMARY 


The results of this investigation indicate that perceptual span is 
related to rate of reading and to rate of perception. They also 
indicate that training directed at the improvement of perceptual 
span and which accomplishes this end may also improve rate of 
reading and rate of perception. The results regarding the effi- 
ciency of training in perceptual span upon improvability in read- 
ing rate by direct instruction are inconclusive. There is a 
suggestion that the group that had previous training in perceptual 
span made faster initial progress in improvement in rate than a 
comparable group that had not had training in perceptual span. 
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A need to predict accurately the probable success of applicants 
for admission to engineering colleges has always existed. But 
today, with engineering college facilities taxed to capacity and 
straining to provide a chance for education to both the veteran 
and non-veteran, it is increasingly important to select only those 
applicants as students who have the greatest chance of being 
benefitted by a technical training. 

From 1941 to 1943, the writer studied the selection problems of 
a New England technical college! with the intent of developing a 
method of detecting those applicants for admission who had the 
best chance of making a successful record in their freshman year. 
A review of this study is being presented, even at this late date, in 
the hope that consideration of the results, both positive and nega- 
tive, may help solve one of our postwar educational problems; and 
that it may be useful in building valid, enduring peacetime admis- 
sion procedures. 


CRITERION OF SUCCESS 


The criterion of academic success was “Tech marks,’ which were 
a weighted average of percentage grades. The weights assigned 
to the various subjects were the same as those used by the college 
to determine class standing. The mathematics percentage grade 
was multiplied by five, science by three, engineering drawing by 
two, foreign language by two, history by two, and rhetoric by two. 
These were converted into standard scores. 

Predictive agents studied:—1) A.C.E. Codéperative General 
Achievement Tests in Mathematics, hereafter called ‘Math Test.’ 

2) A.C.E. Coéperative General Achievement Test in Physics or 
Chemistry (the testee could select either of the two sciences in 
which he wished to be tested), hereafter called ‘Science Test.’ 
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381 








382 The Journal of Educational Psychology 


3) A.C.E. Codperative General Achievement Test in Reading 
Comprehension. 

4) The Iowa Silent Reading Test. 

5) The Yale University Department of Personnel Study Test 
II, Form J, part 1, hereafter called ‘Yale 1.’ 

6) The Yale University Department of Personnel Study Test 
II, Form J, part 2, hereafter called ‘Yale 2.’ 

7) A Studiousness Questionnaire devised by Dr. Vernon Jones 
of Clark University. 

8) High-school marks. 


THE APTITUDE TEST BATTERY 


All the tests were administered to the one hundred fifty-six 
freshmen who entered Worcester Polytechnic Institute in Septem- 
ber, 1941, and completed the academic year. Zero-order correla- 
tions between the seven tests and first semester marks, and 
inter-correlations between the tests were obtained. By the use 
of partial regression and multiple correlational techniques, it 
was found that the tests combined in the following way: 


Math test stan- 4 Science test stan- ry Yale 1 stan- 
dard score X 2 dard score X 2 dard score 


Yale 2 stan- 
dard score 





6 


would correlate most highly with the criterion. The actual 
correlation was .55 between this battery and first semester marks. 
The reading tests and the studiousness questionnaire added noth- 
ing more of predictive value, and so were dropped from the 
battery. The final test battery took eighty minutes to admin- 
ister, and consisted of the A.C.E. Math and Science achievement 
tests, and part one and two of the Yale Spatial Relations Test IT, 


form J. 


TREATMENT OF HIGH-SCHOOL MARKS 


Although it has been shown by many investigations that high- 
school marks are the best single indicators of collegiate success, no 
investigation has shown, to the writer’s knowledge, how high- 
school marks should be evaluated or how marks from one school 
could be compared with those of another. In order to equalize 
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the grades from different preparatory and high schools, the ‘cer- 
tification grade’? of the school the applicant came from was sub- 
tracted from the percentage average of the applicant’s high-school 
marks. This excess of the average high-school marks and certifica- 
tion grades will hereafter be referred to as ‘high-school marks.’ 
These high-school marks correlated .41 with first semester Tech 
marks. 

By breaking down this statistic and analyzing individual cases, 
several things became apparent. First, those schools which had 
extremely low certification grades, about 60, allowed the high- 
school marks to be too high. Those schools that had certification 
grades that were too high, above 85, made the high-school marks 
toolow. Empirically, the following correction seemed necessary: 
raise all certification grades below 70 to 70; lower all certification 
grades above 85to85. Pegthecertification grade of all Worcester, 
Massachusetts, high schools but the High School of Commerce at 
80, making the exception 85. 

Second, individual analysis also showed that the average of the 
high-school mathematics and science (chemistry and/or physics) 
marks minus the corrected certification grade had more predictive 
value than considering the average of all high-school marks. 
This average of the high-school mathematics and science marks 
minus the corrected certification grade correlated .51 with first 
semester Tech marks and .57 with the final aptitude test battery 
which, as noted above, consisted of mathematic and science 
achievement tests, and a spatial relations test. This low correla- 
tion might indicate that math and science high-school grades, at 
least as corrected in this study, are determined by factors other 
than achievement and ability; and the rather large correlation 
with college marks might indicate that those intangible factors of 
personality, studiousness, etc. necessary for success in college are 
measured by these corrected high-school marks (the average math 
and science marks minus the corrected certification grade). 


THE PREDICTIVE INDEX 


By regression equations it was found that in the ‘Predictive 
Index,’ the standard scores attained on the aptitude test battery 
should be given a weight of three and the average of the high- 
school math and science marks, minus the corrected certification 





? As determined by the New England College Entrance Certificate Board. 
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grade, converted into standard scores should be given a weight of 
two. The sum of this weighted score was then divided by five. 
This Predictive Index correlated .71 with first semester marks at 
Worcester Polytechnic Institute and .64 with the first year marks 
of the 1941 freshmen. 


THE 1942 FRESHMAN CLASS 


The Predictive Index was developed from the data obtained 
from the 1941 freshmen class, and all the correlations given above 
were those obtained from studying the 1941 group. Correlation 
between the Predictive Index and the first semester marks of the 
one hundred eighty 1942 freshmen who completed the first 
semester dropped from .71 to .53. The aptitude test battery 
alone correlated .51 with these first semester marks, and the 
average of the high-school math and science marks minus the 
corrected certification grades correlated .48 with the first semester 
marks and .34 with the aptitude test battery. Figure 1 sum- 
marize these correlations for both the 1941 and 1942 freshman 


classes. 


High 
Tech school Test 
1941 group marks marks battery 
Tech marks Ist semester xX 
High school marks 51 X 
Test battery .57 18 X 
High 
Tech school Test 
1942 group marks marks ‘battery 
Tech marks 
First Semester xX 
High school marks .48 xX 
Test battery .51 .34 xX 


Fig. 1. Summary of Correlations. 











